building tiered data stores using aesop to bridge sql and no sql systems

19
Building tiered data stores using Aesop to bridge SQL and NoSQL systems Regunath B twitter.com/RegunathB github.com/regunathb Joint work with : Jagadeesh Huliyar, Shoury Bharadwaj, Arya Ketan, Nikhil Bafna, Pratyay Banerjee, Sneha Shukla, Yogesh Dahiya, Kartik Ukhalkar, Milap Wadhwa, Ravindra Yadav, Santosh Patil,Rishab Dua & others at Flipkart

Upload: regunath-balasubramanian

Post on 08-Aug-2015

470 views

Category:

Software


4 download

TRANSCRIPT

Building tiered data stores using Aesop to bridge

SQL and NoSQL systems

Regunath B

twitter.com/RegunathB

github.com/regunathb

Joint work with : Jagadeesh Huliyar, Shoury Bharadwaj, Arya Ketan, Nikhil Bafna, Pratyay Banerjee, Sneha Shukla, Yogesh Dahiya, Kartik Ukhalkar, Milap Wadhwa, Ravindra Yadav, Santosh Patil,Rishab Dua & others at Flipkart

What Data store?• Often heard : I need to scale and therefore will use a NoSQL

database - Really?

• By Type,Access : Relational, KV, Document, Columnar

• Scaling limits of single node databases, product claims & benchmarks of distributed data stores

• Rise of memory-based, flash-optimised data stores (Log structured)

• CAP tradeoffs - Consistency vs. Availability

• Other considerations : Durability of data, disk-to-memory ratios, size of single cluster etc.

Scaling an e-commerce website

Source: Wayback Machine (ca. 2007)

MySQL&Master&

Website&Writes&

MySQL&Slave&

Website&Reads&

Analy5cs&Reads&

Replica5on&MySQL&Master&

Website&Writes&

MySQL&Slave&1&

Website&Reads&

Replica6on&

MySQL&Slave&2&

Analy6cs&Reads&

Replica6on&

MySQL&Master&

Writes&

MySQL&Slave&

Reads&

Replica4on&

Scaling the Data store

Source: http://www.slideshare.net/slashn/slash-n-tech-talk-track-2-website-architecturemistakes-learnings-siddhartha-reddy

and meanwhile… traded Consistency for Availability

Scaling the data store (polyglot persistence)

Data Store

User session Memcached, HBase

Product HBase, Elastic Search, Redis

Cart MySQL, MongoDB

Notifications HBase

Search Solr, Neo4j

Recommendations Hadoop MR, Redis/Aerospike

Promotions (WIP) HBase, Redis

Pattern : Serving Layer + Source of truth data store

Scaling payments data store(s)

Data Flow

TransactionDetails

txn_id merchant_id amount type status bank_namePayments Denormalised DB (Refunds, Consoles) [MySQL]

TransactionSaleDetail

TransactionSummary

TransactionMasterArchive DB (Regulatory Req.) [HBase]

txn_id merchant_id amount type

txn_id status message

txn_id payment_method bank_name

TransactionSaleDetail

TransactionSummary

TransactionMaster

txn_id merchant_id amount type Payments DB (online,live TXNs) [MySQL]

txn_id status message

txn_id payment_method bank_name

TransactionDetails Analytics & Reporting DB (Reports, Queries) [HBase]

txn_id merchant_id amount type status bank_name pmt_method message

Data consistency in polyglot persistence

Caching/Serving Layer challenges

There are only two hard things in Computer Science: cache invalidation and naming things. -- Phil Karlton

• (Low Cache TTLs + Lazy caching + High Request concurrency) results in: • Thundering herds • Exposes Availability of primary data store

• (Cache size + distribution + no. of replicas) results in: • Hard to implement write-through mechanisms to maintain consistency

Serving Layer is Eventually Consistent, at best

Eventual Consistency• Replicas converge over time • Pros • Scale reads through multiple replicas • Higher overall data availability

• Cons • Reads return live data before convergence

• Achieving Eventual Consistency is not easy • Trivially requires Atleast-Once delivery guarantee of updates to all replicas

Aesop : Change data capture, propagation

Producer, Relay, ConsumerMySQL HBase

Subscriber Subscriber Subscriber Subscriber Subscriber

Txn Logs WAL edits

Event Ring Buffer

Event Ring Buffer

Producers

Relay (Data store agnostic m-mapped file containing data update events - Avro

serialized)

ConsumersTransform

& MapTransform

& MapTransform

& MapTransform

& MapTransform

& Map

Log Mining (Old wine in new bottle?)

• "Durability is typically implemented via logging and recovery.” [1] Architecture of a Database System - Michael Stonebraker et.al (2007)

• "The contents of the DB are a cache of the latest records in the log. The truth is the log. The database is a cache of a subset of the log.” - Jay Kreps (creator of Kafka, 2015)

• WAL (write ahead log) ensures:

• Each modification is flushed to disk

• Log records are in order

Data Consistency & Aesop

Consistency Model-map source: https://aphyr.com/

• Sequential Consistency:

• Says nothing about time - there is no reference to the “most recent” write operation

• Mechanisms for Sequential Consistency : Primary-based replication protocols (used by Aesop)

• Strict consistency (e.g Linearizability) : too hard & requires coordination across replicas, relies on absolute global time

• Less strict : Replicas can disagree for ever

Reliable deliveryData store

w4 w5 w6 w7 w8 w9 w10 w11Transaction Logs

Producer

Tailer (Slave/Replica)

Subscriber Subscriber Subscriber

Application Data stores

Zookeeper/ Filesystem

SCN (checkpoints)

Event Ring Buffer

Relayw8 w8 w8

Event consumption

• Clustering

• Data transformation (Data Layer) : Multiple destinations• MySQL• Elastic Search• HBase• Kafka

Monitoring Console

SummaryAesop - pub-sub like Change Capture, propagation system

Performance : • Relay : 1 XL VM (8 core, 32GB)

• Consumers : 4 XL VM, 200 partitions

• Throughput : 30K Inserts per sec (MySQL to HBase)

• Data size : 800 GB

• Not exactly-once delivery

• Not a storage system

• No global ordering across different data-stores (consumers)

What it isn't :

What it is : • Supports multiple data stores

• Delivers updates reliably - at least once, in-order

• Supports varying consumer speeds

More details..• Open Source : https://github.com/Flipkart/aesop

• Support : [email protected]

• Multiple production deployments at Flipkart

Project :

Related Work : • LinkedIn Databus

• [2] Facebook Wormhole

• [1] Architecture of a Database System : http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf

• [2] Wormhole Paper: https://www.usenix.org/system/files/conference/nsdi15/nsdi15-paper-sharma.pdf

References :