advanced databases ben stopford
TRANSCRIPT
Data Storage for Extreme Use Cases The Lay of the Land and a Peek at ODC
Ben Stopford RBS
How fast is a HashMap lookup
~20 ns
Thatrsquos how long it takes light to travel a room
How fast is a database lookup
~20 ms
Thatrsquos how long it takes light to go to Australia and
back
3 times
Computers really are very fast
The problem is wersquore quite good at writing software that
slows them down
Question
Is it fair to compare the performance of a Database with a HashMap
Of course nothellipbull Physical Diversity A database call
involves both Network and Diskbull Functional Diversity Databases provide a
wealth of additional features including persistence transactions consistency etc
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Ethernet ping
Cross Continental Round Trip
1MB DiskEthernet
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
RDMA over Infiniband
Mechanical Sympathy
Key Point 1
Simple computer programs operating in a single address space
are extremely fast
Why are there so many types of database these dayshellipbecause we need different architectures for different jobs
Times are changing
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
How fast is a HashMap lookup
~20 ns
Thatrsquos how long it takes light to travel a room
How fast is a database lookup
~20 ms
Thatrsquos how long it takes light to go to Australia and
back
3 times
Computers really are very fast
The problem is wersquore quite good at writing software that
slows them down
Question
Is it fair to compare the performance of a Database with a HashMap
Of course nothellipbull Physical Diversity A database call
involves both Network and Diskbull Functional Diversity Databases provide a
wealth of additional features including persistence transactions consistency etc
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Ethernet ping
Cross Continental Round Trip
1MB DiskEthernet
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
RDMA over Infiniband
Mechanical Sympathy
Key Point 1
Simple computer programs operating in a single address space
are extremely fast
Why are there so many types of database these dayshellipbecause we need different architectures for different jobs
Times are changing
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
Thatrsquos how long it takes light to travel a room
How fast is a database lookup
~20 ms
Thatrsquos how long it takes light to go to Australia and
back
3 times
Computers really are very fast
The problem is wersquore quite good at writing software that
slows them down
Question
Is it fair to compare the performance of a Database with a HashMap
Of course nothellipbull Physical Diversity A database call
involves both Network and Diskbull Functional Diversity Databases provide a
wealth of additional features including persistence transactions consistency etc
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Ethernet ping
Cross Continental Round Trip
1MB DiskEthernet
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
RDMA over Infiniband
Mechanical Sympathy
Key Point 1
Simple computer programs operating in a single address space
are extremely fast
Why are there so many types of database these dayshellipbecause we need different architectures for different jobs
Times are changing
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
How fast is a database lookup
~20 ms
Thatrsquos how long it takes light to go to Australia and
back
3 times
Computers really are very fast
The problem is wersquore quite good at writing software that
slows them down
Question
Is it fair to compare the performance of a Database with a HashMap
Of course nothellipbull Physical Diversity A database call
involves both Network and Diskbull Functional Diversity Databases provide a
wealth of additional features including persistence transactions consistency etc
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Ethernet ping
Cross Continental Round Trip
1MB DiskEthernet
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
RDMA over Infiniband
Mechanical Sympathy
Key Point 1
Simple computer programs operating in a single address space
are extremely fast
Why are there so many types of database these dayshellipbecause we need different architectures for different jobs
Times are changing
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
Thatrsquos how long it takes light to go to Australia and
back
3 times
Computers really are very fast
The problem is wersquore quite good at writing software that
slows them down
Question
Is it fair to compare the performance of a Database with a HashMap
Of course nothellipbull Physical Diversity A database call
involves both Network and Diskbull Functional Diversity Databases provide a
wealth of additional features including persistence transactions consistency etc
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Ethernet ping
Cross Continental Round Trip
1MB DiskEthernet
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
RDMA over Infiniband
Mechanical Sympathy
Key Point 1
Simple computer programs operating in a single address space
are extremely fast
Why are there so many types of database these dayshellipbecause we need different architectures for different jobs
Times are changing
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
3 times
Computers really are very fast
The problem is wersquore quite good at writing software that
slows them down
Question
Is it fair to compare the performance of a Database with a HashMap
Of course nothellipbull Physical Diversity A database call
involves both Network and Diskbull Functional Diversity Databases provide a
wealth of additional features including persistence transactions consistency etc
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Ethernet ping
Cross Continental Round Trip
1MB DiskEthernet
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
RDMA over Infiniband
Mechanical Sympathy
Key Point 1
Simple computer programs operating in a single address space
are extremely fast
Why are there so many types of database these dayshellipbecause we need different architectures for different jobs
Times are changing
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
Computers really are very fast
The problem is wersquore quite good at writing software that
slows them down
Question
Is it fair to compare the performance of a Database with a HashMap
Of course nothellipbull Physical Diversity A database call
involves both Network and Diskbull Functional Diversity Databases provide a
wealth of additional features including persistence transactions consistency etc
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Ethernet ping
Cross Continental Round Trip
1MB DiskEthernet
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
RDMA over Infiniband
Mechanical Sympathy
Key Point 1
Simple computer programs operating in a single address space
are extremely fast
Why are there so many types of database these dayshellipbecause we need different architectures for different jobs
Times are changing
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
The problem is wersquore quite good at writing software that
slows them down
Question
Is it fair to compare the performance of a Database with a HashMap
Of course nothellipbull Physical Diversity A database call
involves both Network and Diskbull Functional Diversity Databases provide a
wealth of additional features including persistence transactions consistency etc
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Ethernet ping
Cross Continental Round Trip
1MB DiskEthernet
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
RDMA over Infiniband
Mechanical Sympathy
Key Point 1
Simple computer programs operating in a single address space
are extremely fast
Why are there so many types of database these dayshellipbecause we need different architectures for different jobs
Times are changing
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
Question
Is it fair to compare the performance of a Database with a HashMap
Of course nothellipbull Physical Diversity A database call
involves both Network and Diskbull Functional Diversity Databases provide a
wealth of additional features including persistence transactions consistency etc
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Ethernet ping
Cross Continental Round Trip
1MB DiskEthernet
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
RDMA over Infiniband
Mechanical Sympathy
Key Point 1
Simple computer programs operating in a single address space
are extremely fast
Why are there so many types of database these dayshellipbecause we need different architectures for different jobs
Times are changing
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
Of course nothellipbull Physical Diversity A database call
involves both Network and Diskbull Functional Diversity Databases provide a
wealth of additional features including persistence transactions consistency etc
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Ethernet ping
Cross Continental Round Trip
1MB DiskEthernet
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
RDMA over Infiniband
Mechanical Sympathy
Key Point 1
Simple computer programs operating in a single address space
are extremely fast
Why are there so many types of database these dayshellipbecause we need different architectures for different jobs
Times are changing
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Ethernet ping
Cross Continental Round Trip
1MB DiskEthernet
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
RDMA over Infiniband
Mechanical Sympathy
Key Point 1
Simple computer programs operating in a single address space
are extremely fast
Why are there so many types of database these dayshellipbecause we need different architectures for different jobs
Times are changing
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
Key Point 1
Simple computer programs operating in a single address space
are extremely fast
Why are there so many types of database these dayshellipbecause we need different architectures for different jobs
Times are changing
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
Why are there so many types of database these dayshellipbecause we need different architectures for different jobs
Times are changing
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
Times are changing
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
Traditional Database Architecture is Aging
Most modern databases still follow a 1970s architecture (for example IBMrsquos System R)
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
ldquoBecause RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark then there is no market where they are competitive As such they should be considered as legacy technology more than a quarter of a century in age for which a complete redesign and re-architecting is the appropriate next steprdquo
Michael Stonebraker (Creator of Ingres and Postgres)
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
The Traditional Architecture
bull Data lives on diskbull Users have an allocated user
space where intermediary results are calculated
bull The database brings data normally via indexes into memory and performs filters joins reordering and aggregation operations
bull The result is sent to the user
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
Key Point 2
Different architectural decisions about how we store and access data are needed in different
environments Our lsquoContextrsquo has
changed
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
Simplifying the Contract
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
How big is the internet
5 exabytes
(which is 5000 petabytes or
5000000 terabytes)
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
How big is an average enterprise database
80 lt 1TB(in 2009)
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
The context of our
problem has changed
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
Simplifying the Contract
bull For some use cases ACIDTransactions are overkill
bull Implementing ACID in a distributed architecture has a significant affect on performance
bull This is where the NoSQL Movement came from
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
Databases have huge operational overheads
Research with Shore DB indicates only 68 of
instructions contribute to lsquouseful workrsquo
Taken from ldquoOLTP Through the Looking Glass and What We Found Thererdquo Harizopoulos et al
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
Avoid that overhead with a simpler contract and avoiding IO
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
Key Point 3
For the very top end data volumes a
simpler contract is mandatory ACID is simply not possible
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
Key Point 3 (addendum)
But we should always retain ACID properties if our use case allows
it
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
Options for scaling-out
the traditional
architecture
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
1 The Shared Disk Architecture
SharedDisk
bull More lsquogruntrsquobull Popular for mid-
range data setsbull Multiple machines
must contend for ownership (Distributed disklock contention)
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
2 The Shared Nothing Architecture
bull Massive storage potential
bull Massive scalability of processing
bull Popular for high level storage solutions
bull Commodity hardwarebull Around since the 80rsquos
but only really popular since the BigData era
bull Limited by cross partition joins
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
Each machine is responsible for a subset of the records Each record
exists on only one machine
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
3 The In Memory Database
(single address-space)
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
Databases must cache subsets of the data in
memory
Cache
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
Not knowing what you donrsquot know
Most queries still go to disk to ldquosee what they missedrdquo
Data on Disk
90 in Cache
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
If you can fit it ALL in memory you know everything
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
The architecture of an in memory database
bull All data is at your fingertips
bull Query plans become less important as there is no IO
bull Intermediary results are just pointers
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
Memory is at least 100x faster than disk
0000000000000
μs ns psms
L1 Cache Ref
L2 Cache Ref
Main MemoryRef
1MB Main Memory
Cross Network Round Trip
Cross Continental Round Trip
1MB DiskNetwork
L1 ref is about 2 clock cycles or 07ns This is the time it takes light to travel 20cm
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
Random vs Sequential AccessMemory allows random access Disk only works well for sequential reads
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
This makes them very fast
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
The proof is in the stats TPC-H Benchmarks on a
1TB data setbull Exasol 4253937 QphH (In-Memory DB)bull Oracle Database 11g (RAC) 1166976 QphHbull SQL Server 173961QphH
bull NB ndash TPC-H is a decision support benchmark For OLTP the traditional architectures currently do well Most notably Oraclersquos Sparc Supercluster
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
So why havenrsquot in-memory databases
taken off
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
Address-Spaces are relatively small and of a
finite fixed size
bull What happens when your data grows beyond your available memory
The lsquoOne more bit problemrsquo
Durability
What happens when you pull the plug
Durability
What happens when you pull the plug
One solution is distribution
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Distributed In Memory (Shared Nothing)
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Again we spread our data but this time only using RAM
765 769hellip
1 2 3hellip 97 98 99hellip
333 334hellip 244 245hellip
169 170hellipClient
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Distribution solves our two problems
bull Solve the lsquoone more bitrsquo problem by adding more hardware
bull Solve the durability problem with backups on another machine
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
We get massive amounts of parallel processing
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
But at the cost of
loosing the single
address space
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Traditional
Distributed In
Memory
Shared Disk
In Memory
Shared Nothing
Simpler Contract
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Key Point 4There are three key forces
Distribution
Gain scalability through a distributed architecture
Simplify the
contract
Improve scalability by picking appropriate ACID properties
No Disk
All data is held in RAM
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
These three non-functional
themes lay behind the design of ODC RBSrsquos in-memory data warehouse
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
ODC
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
ODC represents a
balance between
throughput and latency
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
What is Latency
Latency is a measure of response time
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
What is Throughput
Throughput is a measure of the consumption of workmessages in a prescribed amount of time
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Which is best for latency
Latency
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Which is best for throughput
Throughput
Traditional
Database
Shared Nothing (Distribut
ed) In-Memory
Database
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
So why do we use distributed in-memory
In Memory
Plentiful hardwar
e
Latency Throughput
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
Realtime Graph DB
450 processes
Messaging (Topic Based) as a system of record
(persistence)
2TB of RAM
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
The LayersD
ata
Layer Transactio
ns
Cashflows
Query
Layer
Mtms
Acc
ess
La
yer
Java client
API
Java client
API
Pers
iste
nce
Layer
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Three Tools of Distributed Data Architecture
Indexing
Replication
Partitioning
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
How should we use these tools
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Replication puts data everywhere
Wherever you go the data will be there
But your storage is limited by the memory on a node
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Partitioning scalesKeys Aa-Ap
Scalable storage bandwidth and processing
Associating data in different partitions implies moving it
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
So we have some dataOur data is bound together
in a model
Trade
PartyTrader
Desk
Name
Sub
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Which we save
Trade
Party
Trader
Trade
Party
Trader
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
The hops have to be spread over time
Network
Time
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Lots of network hops makes it slow
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
OK ndash what if we held it all together ldquoDenormalisedrdquo
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Hence denormalisation is FAST
(for reads)
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Denormalisation implies the duplication of some
sub-entities
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
hellipand that means managing consistency over
lots of copies
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
hellipand all the duplication means you run out of space really quickly
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Spaces issues are exaggerated further when
data is versioned
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
hellipand you need versioning to do MVCC
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
And reconstituting a previous time slice
becomes very diffi cultTrad
ePart
yTrade
r
Trade
Trade
Party
Party
Party
Trader
Trader
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
So we want to hold entities separately
(normalised) to alleviate concerns around consistency and space usage
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Remember this means the object graph will be split across multiple machines
Trade
Party
Trader
Trade
Party
Trader
Independently Versioned
Data is Singleton
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Binding them back together involves a ldquodistributed joinrdquo =gt
Lots of network hops
Trade
Party
Trader
Trade
Party
Trader
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Whereas the denormalised model the join is already
done
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
So what we want is the advantages of a normalised store at the speed of a denormalised one
This is what using Snowflake Schemas and the Connected Replication pattern
is all about
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Looking more closely Why does normalisation mean we have to spread data around the cluster Why
canrsquot we hold it all together
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Itrsquos all about the keys
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
We can collocate data with common keys but if they crosscut the only way to
collocate is to replicate
Common Keys
Crosscutting
Keys
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
We tackle this problem with a hybrid model
Trade
PartyTrader
Partitioned
Replicated
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
We adapt the concept of a Snowflake Schema
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Taking the concept of Facts and Dimensions
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Everything starts from a Core Fact (Trades for us)
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Facts are Big dimensions are small
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Facts have one key that relates them all (used to
partition)
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Dimensions have many keys
(which crosscut the partitioning key)
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Looking at the data
Facts=gtBig common keys
Dimensions=gtSmallcrosscutting Keys
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
We remember we are a grid We should avoid the
distributed join
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
hellip so we only want to lsquojoinrsquo data that is in the same
process
Trades
MTMs
Common Key
Use a Key Assignment
Policy (eg
KeyAssociation in Coherence)
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
So we prescribe different physical storage for Facts
and Dimensions
Trade
PartyTrader
Partitioned
Replicated
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Facts are partitioned dimensions are replicated
Data
La
yer
Transactions
Cashflows
Query
Layer
Mtms
Fact Storage(Partitioned)
Trade
PartyTrader
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Facts are partitioned dimensions are replicated
Transactions
Cashflows
Dimensions(repliacte)
Mtms
Fact Storage(Partitioned)
Facts(distribute partition)
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
The data volumes back this up as a sensible hypothesis
Facts=gtBig=gtDistrib
ute
Dimensions=gtSmall =gt Replicate
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Key Point
We use a variant on a Snowflake Schema to
partition big entities that can be related via a partitioning key and
replicate small stuff whorsquos keys canrsquot map to our
partitioning key
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Replicate
Distribute
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
So how does they help us to run queries without
distributed joins
This query involvesbull Joins between Dimensionsbull Joins between Facts
Select Transaction MTM RefrenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
What would this look like without this pattern
Get Cost
Centers
Get LedgerBooks
Get SourceBooks
Get Transac-tions
Get MTMs
Get Legs
Get Cost
Centers
Network
Time
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
But by balancing Replication and Partitioning we donrsquot need all
those hops
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Stage 1 Focus on the where clause
Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 1 Get the right keys to query the Facts
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 2 Cluster Join to get Facts
Join Dimensions in Query Layer
Join Facts across cluster
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Stage 2 Join the facts together effi ciently as we know they are
collocated
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Transactions
Cashflows
Mtms
Partitioned Storage
Stage 3 Augment raw Facts with relevant
Dimensions
Join Dimensions in Query Layer
Join Facts across cluster
Join Dimensions in Query Layer
Select Transaction MTM ReferenceData From MTM Transaction Ref Where Cost Centre = lsquoCC1rsquo
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Stage 3 Bind relevant dimensions to the result
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Bringing it together
Java client
API
Replicated Dimensions
Partitioned Facts
We never have to do a distributed join
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
So all the big stuff is held partitioned
And we can join without shipping keys around and
having intermediate
results
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
We get to do thishellip
Trade
Party
Trader
Trade
Party
Trader
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
hellipand thishellip
Trade
Party
Trader Version 1
Trade
Party
Trader Version 2
Trade
Party
Trader Version 3
Trade
Party
Trader Version 4
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
and this
Trade
Party
Trader
Trade
Trade
Party
Party
Party
Trader
Trader
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
hellipwithout the problems of thishellip
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
hellipor this
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
all at the speed of thishellip well almost
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
But there is a fly in the ointmenthellip
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
I lied earlier These arenrsquot all Facts
Facts
Dimensions
This is a dimensionbull It has a different
key to the Factsbull And itrsquos BIG
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
We canrsquot replicate really big stuffhellip wersquoll run out of space =gt Big Dimensions are a problem
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Fortunately there is a simple solution
The Connected Replication
Pattern
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Whilst there are lots of these big dimensions a large majority are never used They are not all ldquoconnectedrdquo
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Looking at the Dimension data some are quite large
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
But Connected Dimension Data is tiny by comparison
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
One recent independent study from the database community showed that 80 of data remains unused
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
So we only replicate
lsquoConnectedrsquo or lsquoUsedrsquo dimensions
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
As data is written to the data store we keep our lsquoConnected Cachesrsquo up
to dateD
ata
Layer
Dimension Caches
(Replicated)
Transactions
Cashflows
Pro
cessin
g
Layer
Mtms
Fact Storage(Partitioned)
As new Facts are added relevant Dimensions that they reference are moved to processing layer caches
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
The Replicated Layer is updated by recursing through the arcs on the domain model when facts change
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Saving a trade causes all itrsquos 1st level references to be triggered
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
Save Trade
Partitioned Cache
Cache Store
Trigger
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
This updates the connected caches
Trade
Party
Alias
Source
Book
Ccy
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
The process recurses through the object graph
Trade
Party
Alias
Source
Book
Ccy
Party
LedgerBook
Data Layer(All Normalised)
Query Layer(With connected dimension Caches)
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
lsquoConnected ReplicationrsquoA simple pattern which recurses through the foreign keys in the
domain model ensuring only lsquoConnectedrsquo dimensions are
replicated
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
With lsquoConnected Replicationrsquo only 110th of the data
needs to be replicated (on
average)
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Limitations of this approach
bullData set size Size of connected dimensions limits scalability
bull Joins are only supported between ldquoFactsrdquo that can share a partitioning key (But any dimension join can be supported)
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Conclusion
bull Traditional database architectures are inappropriate for very low latency or very high throughput applications
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Conclusion
At one end of the scale are the huge shared nothing architectures These favour scalability
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Conclusion
At the other end are in memory architectures ideally using a single address space
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Conclusion
You can blend the two approaches (for example ODC)
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Conclusion
ODC attacks the Distributed Join Problem in an unusual way
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Conclusion
By balancing Replication and Partitioning so we can do any join in a single step
Partitioned Storage
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
Conclusion
With a lsquotwistrsquo that reduces the amount of data replicated by an order of magnitude The Connected Replication pattern
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-
The Endbull Further details online
httpwwwbenstopfordcom
bull Questions
- Data Storage for Extreme Use Cases The Lay of the Land and a P
- How fast is a HashMap lookup
- Thatrsquos how long it takes light to travel a room
- How fast is a database lookup
- Thatrsquos how long it takes light to go to Australia and back
- Slide 6
- Computers really are very fast
- The problem is wersquore quite good at writing software that slows
- Question Is it fair to compare the performance of a Database
- Of course nothellip
- Mechanical Sympathy
- Key Point 1
- Slide 13
- Slide 14
- Times are changing
- Traditional Database Architecture is Aging
- Slide 17
- The Traditional Architecture
- Slide 19
- Key Point 2
- Slide 21
- How big is the internet
- How big is an average enterprise database
- Slide 24
- Simplifying the Contract
- Databases have huge operational overheads
- Avoid that overhead with a simpler contract and avoiding IO
- Key Point 3
- Key Point 3 (addendum)
- Slide 30
- 1 The Shared Disk Architecture
- 2 The Shared Nothing Architecture
- Each machine is responsible for a subset of the records Each r
- 3 The In Memory Database (single address-space)
- Databases must cache subsets of the data in memory
- Not knowing what you donrsquot know
- If you can fit it ALL in memory you know everything
- The architecture of an in memory database
- Memory is at least 100x faster than disk
- Random vs Sequential Access
- This makes them very fast
- The proof is in the stats TPC-H Benchmarks on a 1TB data set
- So why havenrsquot in-memory databases taken off
- Address-Spaces are relatively small and of a finite fixed size
- Durability
- Slide 46
- Distributed In Memory (Shared Nothing)
- Again we spread our data but this time only using RAM
- Distribution solves our two problems
- We get massive amounts of parallel processing
- Slide 51
- Slide 52
- Key Point 4 There are three key forces
- Slide 54
- ODC
- Slide 56
- What is Latency
- What is Throughput
- Which is best for latency
- Which is best for throughput
- So why do we use distributed in-memory
- ODC ndash Distributed Shared Nothing In Memory Semi-Normalised
- The Layers
- Three Tools of Distributed Data Architecture
- How should we use these tools
- Replication puts data everywhere
- Partitioning scales
- So we have some data Our data is bound together in a model
- Which we save
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot
- The hops have to be spread over time
- Lots of network hops makes it slow
- OK ndash what if we held it all together ldquoDenormalisedrdquo
- Hence denormalisation is FAST (for reads)
- Denormalisation implies the duplication of some sub-entities
- hellipand that means managing consistency over lots of copies
- hellipand all the duplication means you run out of space really quic
- Spaces issues are exaggerated further when data is versioned
- And reconstituting a previous time slice becomes very difficult
- Slide 80
- Remember this means the object graph will be split across multi
- Binding them back together involves a ldquodistributed joinrdquo =gt Lot (2)
- Whereas the denormalised model the join is already done
- So what we want is the advantages of a normalised store at the
- Looking more closely Why does normalisation mean we have to sp
- Itrsquos all about the keys
- We can collocate data with common keys but if they crosscut the
- We tackle this problem with a hybrid model
- We adapt the concept of a Snowflake Schema
- Taking the concept of Facts and Dimensions
- Everything starts from a Core Fact (Trades for us)
- Facts are Big dimensions are small
- Facts have one key that relates them all (used to partition)
- Dimensions have many keys (which crosscut the partitioning key
- Looking at the data
- We remember we are a grid We should avoid the distributed join
- hellip so we only want to lsquojoinrsquo data that is in the same process
- So we prescribe different physical storage for Facts and Dimens
- Facts are partitioned dimensions are replicated
- Facts are partitioned dimensions are replicated (2)
- The data volumes back this up as a sensible hypothesis
- Key Point
- Slide 103
- So how does they help us to run queries without distributed joi
- What would this look like without this pattern
- But by balancing Replication and Partitioning we donrsquot need all
- Stage 1 Focus on the where clause Where Cost Centre = lsquoCC1rsquo
- Stage 1 Get the right keys to query the Facts
- Stage 2 Cluster Join to get Facts
- Stage 2 Join the facts together efficiently as we know they ar
- Stage 3 Augment raw Facts with relevant Dimensions
- Stage 3 Bind relevant dimensions to the result
- Bringing it together
- Slide 114
- We get to do thishellip
- hellipand thishellip
- and this
- hellipwithout the problems of thishellip
- hellipor this
- all at the speed of thishellip well almost
- Slide 121
- But there is a fly in the ointmenthellip
- I lied earlier These arenrsquot all Facts
- We canrsquot replicate really big stuffhellip wersquoll run out of space =gt
- Fortunately there is a simple solution
- Whilst there are lots of these big dimensions a large majority
- If there are no Trades for Goldmans in the data store then a Tr
- Looking at the Dimension data some are quite large
- But Connected Dimension Data is tiny by comparison
- One recent independent study from the database community showed
- Slide 131
- As data is written to the data store we keep our lsquoConnected Cac
- The Replicated Layer is updated by recursing through the arcs o
- Saving a trade causes all itrsquos 1st level references to be trigg
- This updates the connected caches
- The process recurses through the object graph
- Slide 137
- Slide 138
- Limitations of this approach
- Conclusion
- Conclusion (2)
- Conclusion (3)
- Conclusion (4)
- Conclusion (5)
- Conclusion (6)
- Conclusion (7)
- The End
-