dynamo and bigtable - review and comparison
TRANSCRIPT
![Page 1: Dynamo and BigTable - Review and Comparison](https://reader031.vdocuments.mx/reader031/viewer/2022020208/55a5c4ab1a28abe16d8b458c/html5/thumbnails/1.jpg)
Dynamo and BigTable Review and Comparison
IEEEI 2014 Grisha Weintraub
![Page 2: Dynamo and BigTable - Review and Comparison](https://reader031.vdocuments.mx/reader031/viewer/2022020208/55a5c4ab1a28abe16d8b458c/html5/thumbnails/2.jpg)
Outline
• Introduction to NoSQL • Introduction to Dynamo and BigTable • Dynamo vs. BigTable comparison
• Open source implementations
![Page 3: Dynamo and BigTable - Review and Comparison](https://reader031.vdocuments.mx/reader031/viewer/2022020208/55a5c4ab1a28abe16d8b458c/html5/thumbnails/3.jpg)
Introduction to NoSQL
• New generation of databases
• Response to a “big data” challenge
• Main characteristics: – Non-relational – Distributed – Fault tolerant – Scalable
![Page 4: Dynamo and BigTable - Review and Comparison](https://reader031.vdocuments.mx/reader031/viewer/2022020208/55a5c4ab1a28abe16d8b458c/html5/thumbnails/4.jpg)
Introduction to NoSQL
![Page 5: Dynamo and BigTable - Review and Comparison](https://reader031.vdocuments.mx/reader031/viewer/2022020208/55a5c4ab1a28abe16d8b458c/html5/thumbnails/5.jpg)
Dynamo and BigTable - Introduction
Dynamo (Amazon) • Giuseppe DeCandia, et al.:
Dynamo: amazon's highly available key-value store. SOSP 2007
BigTable (Google) • Fay Chang, et al.: BigTable: A
Distributed Storage System for Structured Data. OSDI 2006
Highly Available
Key-value Structured Data
![Page 6: Dynamo and BigTable - Review and Comparison](https://reader031.vdocuments.mx/reader031/viewer/2022020208/55a5c4ab1a28abe16d8b458c/html5/thumbnails/6.jpg)
Dynamo vs. BigTable
BigTable Dynamo
Architecture
Data model
API
Security
Partitioning
Replication
Storage
Membership and failure detection
![Page 7: Dynamo and BigTable - Review and Comparison](https://reader031.vdocuments.mx/reader031/viewer/2022020208/55a5c4ab1a28abe16d8b458c/html5/thumbnails/7.jpg)
Architecture
Dynamo
• Decentralized: – Every node has the same set of
responsibilities as its peers.
– There is no single point of failure.
BigTable
• Centralized: – Single master node maintains
all system metadata. – Other nodes (tablet servers)
handle read and write requests.
Master
![Page 8: Dynamo and BigTable - Review and Comparison](https://reader031.vdocuments.mx/reader031/viewer/2022020208/55a5c4ab1a28abe16d8b458c/html5/thumbnails/8.jpg)
Data Model
Dynamo
• Key-value - data is stored as <key, value> pairs, such that key is a unique identifier and a value is an arbitrary entry.
BigTable
• Multidimensional sorted map – map is indexed by a row key and a column key, and ordered by a row key. Column keys are grouped into sets called column families.
Value Key
{ “Name” : ”John”, “Email” : ”[email protected]”, “Card” : ”6652” }
188
{ “Name” : ”Bob”, “Phone” : ”781455”, “Card” : ”9875” }
145
Financial Data Personal Data User ID
Card = “9875” Name = "Bob" Phone = "781455" 145
Card = “6652” Name = "John" Email = "[email protected]" 188
row key column family
column key
![Page 9: Dynamo and BigTable - Review and Comparison](https://reader031.vdocuments.mx/reader031/viewer/2022020208/55a5c4ab1a28abe16d8b458c/html5/thumbnails/9.jpg)
API
Dynamo
• get – returns an object associated with the given key.
• put – associates the given object with the specified key.
BigTable
• get – returns values from the individual rows.
• scan – iterates over multiple rows.
• put – inserts a value to the specified table's cell.
• delete – deletes a whole row or a specified cell inside a particular row.
![Page 10: Dynamo and BigTable - Review and Comparison](https://reader031.vdocuments.mx/reader031/viewer/2022020208/55a5c4ab1a28abe16d8b458c/html5/thumbnails/10.jpg)
Security
Dynamo
• No security features
BigTable
• Access control rights are granted at column family level.
Financial Data Personal Data Row Key
Card = “9875” Name = "Bob" Phone = "781455" 145
Card = “6652” Name = "John" Email = "[email protected]" 188
Views Personal Data
Views/Updates Personal Data
Views/Updates all the Data
![Page 11: Dynamo and BigTable - Review and Comparison](https://reader031.vdocuments.mx/reader031/viewer/2022020208/55a5c4ab1a28abe16d8b458c/html5/thumbnails/11.jpg)
Partitioning
Dynamo • Consistent Hashing:
– Each node is assigned to a random position on the ring.
– Key is hashed to the fixed point on the ring.
– Node is chosen by walking clockwise from the hash location.
BigTable • Data is stored ordered by a row key. • Each table consists of a set of tablets. • Each tablet is assigned to exactly one
tablet server. • METADATA table stores the location of a
tablet under a row key.
A B
D E
F
G
hash(key)
C
….. id
….. 15000
Tablet 1 ….. ….
….. 20000
….. 20001
Tablet 2 ….. ….
….. 25000
Tablet-51 Tablet-11
Tablet-32 Tablet-7
Tablet-16 Tablet-8
Tablet-1 Tablet-21
Tablet Server 1 Tablet Server 2
![Page 12: Dynamo and BigTable - Review and Comparison](https://reader031.vdocuments.mx/reader031/viewer/2022020208/55a5c4ab1a28abe16d8b458c/html5/thumbnails/12.jpg)
Replication
Dynamo • Each data item is replicated at N nodes
(N is a user-defined parameter). • Each key K is assigned to a coordinator
node. • Coordinator stores the data associated
with K locally, and also replicates it at the N-1 healthy clockwise successor nodes in the ring.
BigTable • Each tablet is stored in GFS as a
sequence of read-only files called SSTables.
• SSTables are divided into fixed-size chunks, and these chunks are stored on chunkservers.
• Each chunk in GFS is replicated across multiple chunkservers.
N = 3
A B
D E
F
G
hash(key)
C
SSTable3 SSTable2 SSTable1
Chunk3 Chunk2 Chunk1
Chunk1
Chunk3
Chunk1
Chunk2
Chunkserver 1 Chunkserver 2
Chunk2
Chunk3
Chunkserver 3
![Page 13: Dynamo and BigTable - Review and Comparison](https://reader031.vdocuments.mx/reader031/viewer/2022020208/55a5c4ab1a28abe16d8b458c/html5/thumbnails/13.jpg)
Storage
Dynamo
• Each node in Dynamo has a local persistence engine where data items are stored as binary objects.
• Different Dynamo instances may use different persistence engines (e.g. MySql, BDB)
• Applications choose the persistence engine based on their object size distribution.
BigTable
• Data is stored in GFS in SSTable file format.
• SSTable is an immutable ordered map, whose keys and values are arbitrary strings.
• SSTable supports "get by key" and "get by key range" requests.
![Page 14: Dynamo and BigTable - Review and Comparison](https://reader031.vdocuments.mx/reader031/viewer/2022020208/55a5c4ab1a28abe16d8b458c/html5/thumbnails/14.jpg)
Membership and Failure detection
Dynamo • Gossip-based protocol:
– Each node contacts a peer chosen at random every second and the two nodes exchange their membership data (every node maintains a persistent view of the membership).
BigTable • Failed tablet servers are
identified by regular handshakes between the master and all tablet servers.
A
B
D E
F
G
C
Master
![Page 15: Dynamo and BigTable - Review and Comparison](https://reader031.vdocuments.mx/reader031/viewer/2022020208/55a5c4ab1a28abe16d8b458c/html5/thumbnails/15.jpg)
Dynamo vs. BigTable
BigTable Dynamo
centralized decentralized Architecture
sorted map key-value Data model
get, put, scan, delete get, put API
access control no Security
key range based consistent hashing Partitioning
chunkservers in GFS successor nodes in the
ring Replication
SSTables in GFS Plug-in Storage
Handshakes initiated by master
Gossip-based protocol Membership and failure
detection
![Page 16: Dynamo and BigTable - Review and Comparison](https://reader031.vdocuments.mx/reader031/viewer/2022020208/55a5c4ab1a28abe16d8b458c/html5/thumbnails/16.jpg)
Open source implementations
![Page 17: Dynamo and BigTable - Review and Comparison](https://reader031.vdocuments.mx/reader031/viewer/2022020208/55a5c4ab1a28abe16d8b458c/html5/thumbnails/17.jpg)
Thank You