advanced databases ben stopford

147
Data Storage for Extreme Use Cases: The Lay of the Land and a Peek at ODC Ben Stopford : RBS

Upload: ben-stopford

Post on 27-May-2015

3.614 views

Category:

Technology


5 download

TRANSCRIPT

Page 1: Advanced databases   ben stopford

Data Storage for Extreme Use Cases The Lay of the Land and a Peek at ODC

Ben Stopford RBS

How fast is a HashMap lookup

~20 ns

Thatrsquos how long it takes light to travel a room

How fast is a database lookup

~20 ms

Thatrsquos how long it takes light to go to Australia and

back

3 times

Computers really are very fast

The problem is wersquore quite good at writing software that

slows them down

Question

Is it fair to compare the performance of a Database with a HashMap

Of course nothellipbull Physical Diversity A database call

involves both Network and Diskbull Functional Diversity Databases provide a

wealth of additional features including persistence transactions consistency etc

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Ethernet ping

Cross Continental Round Trip

1MB DiskEthernet

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

RDMA over Infiniband

Mechanical Sympathy

Key Point 1

Simple computer programs operating in a single address space

are extremely fast

Why are there so many types of database these dayshellipbecause we need different architectures for different jobs

Times are changing

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 2: Advanced databases   ben stopford

How fast is a HashMap lookup

~20 ns

Thatrsquos how long it takes light to travel a room

How fast is a database lookup

~20 ms

Thatrsquos how long it takes light to go to Australia and

back

3 times

Computers really are very fast

The problem is wersquore quite good at writing software that

slows them down

Question

Is it fair to compare the performance of a Database with a HashMap

Of course nothellipbull Physical Diversity A database call

involves both Network and Diskbull Functional Diversity Databases provide a

wealth of additional features including persistence transactions consistency etc

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Ethernet ping

Cross Continental Round Trip

1MB DiskEthernet

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

RDMA over Infiniband

Mechanical Sympathy

Key Point 1

Simple computer programs operating in a single address space

are extremely fast

Why are there so many types of database these dayshellipbecause we need different architectures for different jobs

Times are changing

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 3: Advanced databases   ben stopford

Thatrsquos how long it takes light to travel a room

How fast is a database lookup

~20 ms

Thatrsquos how long it takes light to go to Australia and

back

3 times

Computers really are very fast

The problem is wersquore quite good at writing software that

slows them down

Question

Is it fair to compare the performance of a Database with a HashMap

Of course nothellipbull Physical Diversity A database call

involves both Network and Diskbull Functional Diversity Databases provide a

wealth of additional features including persistence transactions consistency etc

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Ethernet ping

Cross Continental Round Trip

1MB DiskEthernet

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

RDMA over Infiniband

Mechanical Sympathy

Key Point 1

Simple computer programs operating in a single address space

are extremely fast

Why are there so many types of database these dayshellipbecause we need different architectures for different jobs

Times are changing

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 4: Advanced databases   ben stopford

How fast is a database lookup

~20 ms

Thatrsquos how long it takes light to go to Australia and

back

3 times

Computers really are very fast

The problem is wersquore quite good at writing software that

slows them down

Question

Is it fair to compare the performance of a Database with a HashMap

Of course nothellipbull Physical Diversity A database call

involves both Network and Diskbull Functional Diversity Databases provide a

wealth of additional features including persistence transactions consistency etc

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Ethernet ping

Cross Continental Round Trip

1MB DiskEthernet

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

RDMA over Infiniband

Mechanical Sympathy

Key Point 1

Simple computer programs operating in a single address space

are extremely fast

Why are there so many types of database these dayshellipbecause we need different architectures for different jobs

Times are changing

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 5: Advanced databases   ben stopford

Thatrsquos how long it takes light to go to Australia and

back

3 times

Computers really are very fast

The problem is wersquore quite good at writing software that

slows them down

Question

Is it fair to compare the performance of a Database with a HashMap

Of course nothellipbull Physical Diversity A database call

involves both Network and Diskbull Functional Diversity Databases provide a

wealth of additional features including persistence transactions consistency etc

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Ethernet ping

Cross Continental Round Trip

1MB DiskEthernet

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

RDMA over Infiniband

Mechanical Sympathy

Key Point 1

Simple computer programs operating in a single address space

are extremely fast

Why are there so many types of database these dayshellipbecause we need different architectures for different jobs

Times are changing

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 6: Advanced databases   ben stopford

3 times

Computers really are very fast

The problem is wersquore quite good at writing software that

slows them down

Question

Is it fair to compare the performance of a Database with a HashMap

Of course nothellipbull Physical Diversity A database call

involves both Network and Diskbull Functional Diversity Databases provide a

wealth of additional features including persistence transactions consistency etc

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Ethernet ping

Cross Continental Round Trip

1MB DiskEthernet

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

RDMA over Infiniband

Mechanical Sympathy

Key Point 1

Simple computer programs operating in a single address space

are extremely fast

Why are there so many types of database these dayshellipbecause we need different architectures for different jobs

Times are changing

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 7: Advanced databases   ben stopford

Computers really are very fast

The problem is wersquore quite good at writing software that

slows them down

Question

Is it fair to compare the performance of a Database with a HashMap

Of course nothellipbull Physical Diversity A database call

involves both Network and Diskbull Functional Diversity Databases provide a

wealth of additional features including persistence transactions consistency etc

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Ethernet ping

Cross Continental Round Trip

1MB DiskEthernet

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

RDMA over Infiniband

Mechanical Sympathy

Key Point 1

Simple computer programs operating in a single address space

are extremely fast

Why are there so many types of database these dayshellipbecause we need different architectures for different jobs

Times are changing

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 8: Advanced databases   ben stopford

The problem is wersquore quite good at writing software that

slows them down

Question

Is it fair to compare the performance of a Database with a HashMap

Of course nothellipbull Physical Diversity A database call

involves both Network and Diskbull Functional Diversity Databases provide a

wealth of additional features including persistence transactions consistency etc

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Ethernet ping

Cross Continental Round Trip

1MB DiskEthernet

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

RDMA over Infiniband

Mechanical Sympathy

Key Point 1

Simple computer programs operating in a single address space

are extremely fast

Why are there so many types of database these dayshellipbecause we need different architectures for different jobs

Times are changing

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 9: Advanced databases   ben stopford

Question

Is it fair to compare the performance of a Database with a HashMap

Of course nothellipbull Physical Diversity A database call

involves both Network and Diskbull Functional Diversity Databases provide a

wealth of additional features including persistence transactions consistency etc

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Ethernet ping

Cross Continental Round Trip

1MB DiskEthernet

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

RDMA over Infiniband

Mechanical Sympathy

Key Point 1

Simple computer programs operating in a single address space

are extremely fast

Why are there so many types of database these dayshellipbecause we need different architectures for different jobs

Times are changing

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 10: Advanced databases   ben stopford

Of course nothellipbull Physical Diversity A database call

involves both Network and Diskbull Functional Diversity Databases provide a

wealth of additional features including persistence transactions consistency etc

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Ethernet ping

Cross Continental Round Trip

1MB DiskEthernet

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

RDMA over Infiniband

Mechanical Sympathy

Key Point 1

Simple computer programs operating in a single address space

are extremely fast

Why are there so many types of database these dayshellipbecause we need different architectures for different jobs

Times are changing

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 11: Advanced databases   ben stopford

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Ethernet ping

Cross Continental Round Trip

1MB DiskEthernet

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

RDMA over Infiniband

Mechanical Sympathy

Key Point 1

Simple computer programs operating in a single address space

are extremely fast

Why are there so many types of database these dayshellipbecause we need different architectures for different jobs

Times are changing

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 12: Advanced databases   ben stopford

Key Point 1

Simple computer programs operating in a single address space

are extremely fast

Why are there so many types of database these dayshellipbecause we need different architectures for different jobs

Times are changing

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 13: Advanced databases   ben stopford

Why are there so many types of database these dayshellipbecause we need different architectures for different jobs

Times are changing

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 14: Advanced databases   ben stopford

Times are changing

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 15: Advanced databases   ben stopford

Traditional Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 16: Advanced databases   ben stopford

ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo

Michael Stonebraker (Creator of Ingres and Postgres)

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 17: Advanced databases   ben stopford

The Traditional Architecture

bull Data lives on diskbull Users have an allocated user

space where intermediary results are calculated

bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations

bull The result is sent to the user

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 18: Advanced databases   ben stopford

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 19: Advanced databases   ben stopford

Key Point 2

Different architectural decisions about how we store and access data are needed in different

environments Our lsquoContextrsquo has

changed

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 20: Advanced databases   ben stopford

Simplifying the Contract

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 21: Advanced databases   ben stopford

How big is the internet

5 exabytes

(which is 5000 petabytes or

5000000 terabytes)

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 22: Advanced databases   ben stopford

How big is an average enterprise database

80 lt 1TB(in 2009)

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 23: Advanced databases   ben stopford

The context of our

problem has changed

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 24: Advanced databases   ben stopford

Simplifying the Contract

bull For some use cases ACIDTransactions are overkill

bull Implementing ACID in a distributed architecture has a significant affect on performance

bull This is where the NoSQL Movement came from

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 25: Advanced databases   ben stopford

Databases have huge operational overheads

Research with Shore DB indicates only 68 of

instructions contribute to lsquouseful workrsquo

Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 26: Advanced databases   ben stopford

Avoid that overhead with a simpler contract and avoiding IO

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 27: Advanced databases   ben stopford

Key Point 3

For the very top end data volumes a

simpler contract is mandatory ACID is simply not possible

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 28: Advanced databases   ben stopford

Key Point 3 (addendum)

But we should always retain ACID properties if our use case allows

it

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 29: Advanced databases   ben stopford

Options for scaling-out

the traditional

architecture

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 30: Advanced databases   ben stopford

1 The Shared Disk Architecture

SharedDisk

bull More lsquogruntrsquobull Popular for mid-

range data setsbull Multiple machines

must contend for ownership (Distributed disklock contention)

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 31: Advanced databases   ben stopford

2 The Shared Nothing Architecture

bull Massive storage potential

bull Massive scalability of processing

bull Popular for high level storage solutions

bull Commodity hardwarebull Around since the 80rsquos

but only really popular since the BigData era

bull Limited by cross partition joins

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 32: Advanced databases   ben stopford

Each machine is responsible for a subset of the records Each record

exists on only one machine

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 33: Advanced databases   ben stopford

3 The In Memory Database

(single address-space)

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 34: Advanced databases   ben stopford

Databases must cache subsets of the data in

memory

Cache

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 35: Advanced databases   ben stopford

Not knowing what you donrsquot know

Most queries still go to disk to ldquosee what they missedrdquo

Data on Disk

90 in Cache

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 36: Advanced databases   ben stopford

If you can fit it ALL in memory you know everything

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 37: Advanced databases   ben stopford

The architecture of an in memory database

bull All data is at your fingertips

bull Query plans become less important as there is no IO

bull Intermediary results are just pointers

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 38: Advanced databases   ben stopford

Memory is at least 100x faster than disk

0000000000000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB DiskNetwork

L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 39: Advanced databases   ben stopford

Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 40: Advanced databases   ben stopford

This makes them very fast

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 41: Advanced databases   ben stopford

The proof is in the stats TPC-H Benchmarks on a

1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH

bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 42: Advanced databases   ben stopford

So why havenrsquot in-memory databases

taken off

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 43: Advanced databases   ben stopford

Address-Spaces are relatively small and of a

finite fixed size

bull What happens when your data grows beyond your available memory

The lsquoOne more bit problemrsquo

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 44: Advanced databases   ben stopford

Durability

What happens when you pull the plug

>

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 45: Advanced databases   ben stopford

One solution is distribution

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 46: Advanced databases   ben stopford

Distributed In Memory (Shared Nothing)

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 47: Advanced databases   ben stopford

Again we spread our data but this time only using RAM

765 769hellip

1 2 3hellip 97 98 99hellip

333 334hellip 244 245hellip

169 170hellipClient

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 48: Advanced databases   ben stopford

Distribution solves our two problems

bull Solve the lsquoone more bitrsquo problem by adding more hardware

bull Solve the durability problem with backups on another machine

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 49: Advanced databases   ben stopford

We get massive amounts of parallel processing

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 50: Advanced databases   ben stopford

But at the cost of

loosing the single

address space

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 51: Advanced databases   ben stopford

Traditional

Distributed In

Memory

Shared Disk

In Memory

Shared Nothing

Simpler Contract

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 52: Advanced databases   ben stopford

Key Point 4There are three key forces

Distribution

Gain scalability through a distributed architecture

Simplify the

contract

Improve scalability by picking appropriate ACID properties

No Disk

All data is held in RAM

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 53: Advanced databases   ben stopford

These three non-functional

themes lay behind the design of ODC RBSrsquos in-memory data warehouse

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 54: Advanced databases   ben stopford

ODC

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 55: Advanced databases   ben stopford

ODC represents a

balance between

throughput and latency

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 56: Advanced databases   ben stopford

What is Latency

Latency is a measure of response time

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 57: Advanced databases   ben stopford

What is Throughput

Throughput is a measure of the consumption of workmessages in a prescribed amount of time

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 58: Advanced databases   ben stopford

Which is best for latency

Latency

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 59: Advanced databases   ben stopford

Which is best for throughput

Throughput

Traditional

Database

Shared Nothing (Distribut

ed) In-Memory

Database

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 60: Advanced databases   ben stopford

So why do we use distributed in-memory

In Memory

Plentiful hardwar

e

Latency Throughput

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 61: Advanced databases   ben stopford

ODC ndash Distributed Shared Nothing In Memory Semi-Normalised

Realtime Graph DB

450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAM

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 62: Advanced databases   ben stopford

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 63: Advanced databases   ben stopford

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 64: Advanced databases   ben stopford

How should we use these tools

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 65: Advanced databases   ben stopford

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 66: Advanced databases   ben stopford

Partitioning scalesKeys Aa-Ap

Scalable storage bandwidth and processing

Associating data in different partitions implies moving it

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 67: Advanced databases   ben stopford

So we have some dataOur data is bound together

in a model

Trade

PartyTrader

Desk

Name

Sub

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 68: Advanced databases   ben stopford

Which we save

Trade

Party

Trader

Trade

Party

Trader

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 69: Advanced databases   ben stopford

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 70: Advanced databases   ben stopford

The hops have to be spread over time

Network

Time

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 71: Advanced databases   ben stopford

Lots of network hops makes it slow

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 72: Advanced databases   ben stopford

OK ndash what if we held it all together ldquoDenormalisedrdquo

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 73: Advanced databases   ben stopford

Hence denormalisation is FAST

(for reads)

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 74: Advanced databases   ben stopford

Denormalisation implies the duplication of some

sub-entities

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 75: Advanced databases   ben stopford

hellipand that means managing consistency over

lots of copies

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 76: Advanced databases   ben stopford

hellipand all the duplication means you run out of space really quickly

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 77: Advanced databases   ben stopford

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

hellipand you need versioning to do MVCC

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 78: Advanced databases   ben stopford

And reconstituting a previous time slice

becomes very diffi cultTrad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 79: Advanced databases   ben stopford

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 80: Advanced databases   ben stopford

Remember this means the object graph will be split across multiple machines

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 81: Advanced databases   ben stopford

Binding them back together involves a ldquodistributed joinrdquo =gt

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 82: Advanced databases   ben stopford

Whereas the denormalised model the join is already

done

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 83: Advanced databases   ben stopford

So what we want is the advantages of a normalised store at the speed of a denormalised one

This is what using Snowflake Schemas and the Connected Replication pattern

is all about

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 84: Advanced databases   ben stopford

Looking more closely Why does normalisation mean we have to spread data around the cluster Why

canrsquot we hold it all together

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 85: Advanced databases   ben stopford

Itrsquos all about the keys

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 86: Advanced databases   ben stopford

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 87: Advanced databases   ben stopford

We tackle this problem with a hybrid model

Trade

PartyTrader

Partitioned

Replicated

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 88: Advanced databases   ben stopford

We adapt the concept of a Snowflake Schema

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 89: Advanced databases   ben stopford

Taking the concept of Facts and Dimensions

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 90: Advanced databases   ben stopford

Everything starts from a Core Fact (Trades for us)

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 91: Advanced databases   ben stopford

Facts are Big dimensions are small

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 92: Advanced databases   ben stopford

Facts have one key that relates them all (used to

partition)

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 93: Advanced databases   ben stopford

Dimensions have many keys

(which crosscut the partitioning key)

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 94: Advanced databases   ben stopford

Looking at the data

Facts=gtBig common keys

Dimensions=gtSmallcrosscutting Keys

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 95: Advanced databases   ben stopford

We remember we are a grid We should avoid the

distributed join

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 96: Advanced databases   ben stopford

hellip so we only want to lsquojoinrsquo data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (eg

KeyAssociation in Coherence)

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 97: Advanced databases   ben stopford

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 98: Advanced databases   ben stopford

Facts are partitioned dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 99: Advanced databases   ben stopford

Facts are partitioned dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute partition)

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 100: Advanced databases   ben stopford

The data volumes back this up as a sensible hypothesis

Facts=gtBig=gtDistrib

ute

Dimensions=gtSmall =gt Replicate

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 101: Advanced databases   ben stopford

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff whorsquos keys canrsquot map to our

partitioning key

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 102: Advanced databases   ben stopford

Replicate

Distribute

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 103: Advanced databases   ben stopford

So how does they help us to run queries without

distributed joins

This query involvesbull Joins between Dimensionsbull Joins between Facts

Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 104: Advanced databases   ben stopford

What would this look like without this pattern

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 105: Advanced databases   ben stopford

But by balancing Replication and Partitioning we donrsquot need all

those hops

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 106: Advanced databases   ben stopford

Stage 1 Focus on the where clause

Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 107: Advanced databases   ben stopford

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1 Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 108: Advanced databases   ben stopford

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2 Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 109: Advanced databases   ben stopford

Stage 2 Join the facts together effi ciently as we know they are

collocated

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 110: Advanced databases   ben stopford

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3 Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 111: Advanced databases   ben stopford

Stage 3 Bind relevant dimensions to the result

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 112: Advanced databases   ben stopford

Bringing it together

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 113: Advanced databases   ben stopford

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 114: Advanced databases   ben stopford

We get to do thishellip

Trade

Party

Trader

Trade

Party

Trader

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 115: Advanced databases   ben stopford

hellipand thishellip

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 116: Advanced databases   ben stopford

and this

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 117: Advanced databases   ben stopford

hellipwithout the problems of thishellip

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 118: Advanced databases   ben stopford

hellipor this

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 119: Advanced databases   ben stopford

all at the speed of thishellip well almost

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 120: Advanced databases   ben stopford

But there is a fly in the ointmenthellip

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 121: Advanced databases   ben stopford

I lied earlier These arenrsquot all Facts

Facts

Dimensions

This is a dimensionbull It has a different

key to the Factsbull And itrsquos BIG

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 122: Advanced databases   ben stopford

We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 123: Advanced databases   ben stopford

Fortunately there is a simple solution

The Connected Replication

Pattern

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 124: Advanced databases   ben stopford

Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 125: Advanced databases   ben stopford

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 126: Advanced databases   ben stopford

Looking at the Dimension data some are quite large

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 127: Advanced databases   ben stopford

But Connected Dimension Data is tiny by comparison

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 128: Advanced databases   ben stopford

One recent independent study from the database community showed that 80 of data remains unused

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 129: Advanced databases   ben stopford

So we only replicate

lsquoConnectedrsquo or lsquoUsedrsquo dimensions

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 130: Advanced databases   ben stopford

As data is written to the data store we keep our lsquoConnected Cachesrsquo up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 131: Advanced databases   ben stopford

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 132: Advanced databases   ben stopford

Saving a trade causes all itrsquos 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 133: Advanced databases   ben stopford

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 134: Advanced databases   ben stopford

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 135: Advanced databases   ben stopford

lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the

domain model ensuring only lsquoConnectedrsquo dimensions are

replicated

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 136: Advanced databases   ben stopford

With lsquoConnected Replicationrsquo only 110th of the data

needs to be replicated (on

average)

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 137: Advanced databases   ben stopford

Limitations of this approach

bullData set size Size of connected dimensions limits scalability

bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 138: Advanced databases   ben stopford

Conclusion

bull Traditional database architectures are inappropriate for very low latency or very high throughput applications

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 139: Advanced databases   ben stopford

Conclusion

At one end of the scale are the huge shared nothing architectures These favour scalability

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 140: Advanced databases   ben stopford

Conclusion

At the other end are in memory architectures ideally using a single address space

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 141: Advanced databases   ben stopford

Conclusion

You can blend the two approaches (for example ODC)

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 142: Advanced databases   ben stopford

Conclusion

ODC attacks the Distributed Join Problem in an unusual way

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 143: Advanced databases   ben stopford

Conclusion

By balancing Replication and Partitioning so we can do any join in a single step

Partitioned Storage

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 144: Advanced databases   ben stopford

Conclusion

With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End
Page 145: Advanced databases   ben stopford

The Endbull Further details online

httpwwwbenstopfordcom

bull Questions

  • Data Storage for Extreme Use Cases The Lay of the Land and a P
  • How fast is a HashMap lookup
  • Thatrsquos how long it takes light to travel a room
  • How fast is a database lookup
  • Thatrsquos how long it takes light to go to Australia and back
  • Slide 6
  • Computers really are very fast
  • The problem is wersquore quite good at writing software that slows
  • Question Is it fair to compare the performance of a Database
  • Of course nothellip
  • Mechanical Sympathy
  • Key Point 1
  • Slide 13
  • Slide 14
  • Times are changing
  • Traditional Database Architecture is Aging
  • Slide 17
  • The Traditional Architecture
  • Slide 19
  • Key Point 2
  • Slide 21
  • How big is the internet
  • How big is an average enterprise database
  • Slide 24
  • Simplifying the Contract
  • Databases have huge operational overheads
  • Avoid that overhead with a simpler contract and avoiding IO
  • Key Point 3
  • Key Point 3 (addendum)
  • Slide 30
  • 1 The Shared Disk Architecture
  • 2 The Shared Nothing Architecture
  • Each machine is responsible for a subset of the records Each r
  • 3 The In Memory Database (single address-space)
  • Databases must cache subsets of the data in memory
  • Not knowing what you donrsquot know
  • If you can fit it ALL in memory you know everything
  • The architecture of an in memory database
  • Memory is at least 100x faster than disk
  • Random vs Sequential Access
  • This makes them very fast
  • The proof is in the stats TPC-H Benchmarks on a 1TB data set
  • So why havenrsquot in-memory databases taken off
  • Address-Spaces are relatively small and of a finite fixed size
  • Durability
  • Slide 46
  • Distributed In Memory (Shared Nothing)
  • Again we spread our data but this time only using RAM
  • Distribution solves our two problems
  • We get massive amounts of parallel processing
  • Slide 51
  • Slide 52
  • Key Point 4 There are three key forces
  • Slide 54
  • ODC
  • Slide 56
  • What is Latency
  • What is Throughput
  • Which is best for latency
  • Which is best for throughput
  • So why do we use distributed in-memory
  • ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
  • The Layers
  • Three Tools of Distributed Data Architecture
  • How should we use these tools
  • Replication puts data everywhere
  • Partitioning scales
  • So we have some data Our data is bound together in a model
  • Which we save
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot
  • The hops have to be spread over time
  • Lots of network hops makes it slow
  • OK ndash what if we held it all together ldquoDenormalisedrdquo
  • Hence denormalisation is FAST (for reads)
  • Denormalisation implies the duplication of some sub-entities
  • hellipand that means managing consistency over lots of copies
  • hellipand all the duplication means you run out of space really quic
  • Spaces issues are exaggerated further when data is versioned
  • And reconstituting a previous time slice becomes very difficult
  • Slide 80
  • Remember this means the object graph will be split across multi
  • Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
  • Whereas the denormalised model the join is already done
  • So what we want is the advantages of a normalised store at the
  • Looking more closely Why does normalisation mean we have to sp
  • Itrsquos all about the keys
  • We can collocate data with common keys but if they crosscut the
  • We tackle this problem with a hybrid model
  • We adapt the concept of a Snowflake Schema
  • Taking the concept of Facts and Dimensions
  • Everything starts from a Core Fact (Trades for us)
  • Facts are Big dimensions are small
  • Facts have one key that relates them all (used to partition)
  • Dimensions have many keys (which crosscut the partitioning key
  • Looking at the data
  • We remember we are a grid We should avoid the distributed join
  • hellip so we only want to lsquojoinrsquo data that is in the same process
  • So we prescribe different physical storage for Facts and Dimens
  • Facts are partitioned dimensions are replicated
  • Facts are partitioned dimensions are replicated (2)
  • The data volumes back this up as a sensible hypothesis
  • Key Point
  • Slide 103
  • So how does they help us to run queries without distributed joi
  • What would this look like without this pattern
  • But by balancing Replication and Partitioning we donrsquot need all
  • Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
  • Stage 1 Get the right keys to query the Facts
  • Stage 2 Cluster Join to get Facts
  • Stage 2 Join the facts together efficiently as we know they ar
  • Stage 3 Augment raw Facts with relevant Dimensions
  • Stage 3 Bind relevant dimensions to the result
  • Bringing it together
  • Slide 114
  • We get to do thishellip
  • hellipand thishellip
  • and this
  • hellipwithout the problems of thishellip
  • hellipor this
  • all at the speed of thishellip well almost
  • Slide 121
  • But there is a fly in the ointmenthellip
  • I lied earlier These arenrsquot all Facts
  • We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
  • Fortunately there is a simple solution
  • Whilst there are lots of these big dimensions a large majority
  • If there are no Trades for Goldmans in the data store then a Tr
  • Looking at the Dimension data some are quite large
  • But Connected Dimension Data is tiny by comparison
  • One recent independent study from the database community showed
  • Slide 131
  • As data is written to the data store we keep our lsquoConnected Cac
  • The Replicated Layer is updated by recursing through the arcs o
  • Saving a trade causes all itrsquos 1st level references to be trigg
  • This updates the connected caches
  • The process recurses through the object graph
  • Slide 137
  • Slide 138
  • Limitations of this approach
  • Conclusion
  • Conclusion (2)
  • Conclusion (3)
  • Conclusion (4)
  • Conclusion (5)
  • Conclusion (6)
  • Conclusion (7)
  • The End