![Page 1: PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall 20111 Some slides/illustrations](https://reader036.vdocuments.mx/reader036/viewer/2022062713/56649ce65503460f949b492c/html5/thumbnails/1.jpg)
EECS 584, Fall 2011 1
PNUTS: Yahoo’s Hosted Data Serving Platform
Jonathan Danaparamita jdanap at umich dot edu
University of Michigan
Some slides/illustrations from Adam Silberstein’sPNUTS presentation in OSCON July 26 2011
![Page 2: PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall 20111 Some slides/illustrations](https://reader036.vdocuments.mx/reader036/viewer/2022062713/56649ce65503460f949b492c/html5/thumbnails/2.jpg)
EECS 584, Fall 2011 2
What is PNUTS?
Trivia: Abbreviation Real meaning Goals/Requirements
– Scalability, Scalability, Scalability– Low latency– Availability (…or else, $--)– A certain degree of Consistency
![Page 3: PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall 20111 Some slides/illustrations](https://reader036.vdocuments.mx/reader036/viewer/2022062713/56649ce65503460f949b492c/html5/thumbnails/3.jpg)
EECS 584, Fall 2011 3
PNUTS Applications
Yahoo!’s User Database Social Applications Content Meta-Data Listings Management Session Data
![Page 4: PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall 20111 Some slides/illustrations](https://reader036.vdocuments.mx/reader036/viewer/2022062713/56649ce65503460f949b492c/html5/thumbnails/4.jpg)
EECS 584, Fall 2011 4
Design Outlines Data and Query Model
– Simple relational model– Key / Range scan (Hash and Ordered tables)
Global Accesses– Asynchronous (global) replication– Low latency local access– Consistency
Fault tolerant and Load balancing
![Page 5: PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall 20111 Some slides/illustrations](https://reader036.vdocuments.mx/reader036/viewer/2022062713/56649ce65503460f949b492c/html5/thumbnails/5.jpg)
EECS 584, Fall 2011 5
Data Storage and Retrieval Horizontal partitioned table into tablets (groups of records) Storage unit: where the tablets are stored Router
– Interval mapping: boundaries of each tablet– Locates the storage unit given a tablet– Is the cache version of…
Tablet controller– Stores all the mappings– Polled by router for changes– Determines when to split/move tablets
![Page 6: PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall 20111 Some slides/illustrations](https://reader036.vdocuments.mx/reader036/viewer/2022062713/56649ce65503460f949b492c/html5/thumbnails/6.jpg)
EECS 584, Fall 2011 6
PNUTS-Single Region
StorageUnits
VIP
Key JSON
1
Key JSON
Key JSON
Key JSON
2
Key JSON
Key JSON
Key JSON
n
Key JSON
Key JSON
Tablet 1
Tablet 2
Tablet 3
Tablet 4
Tablet 5
Tablet M
Table: FOO
1
3
5
Tablet Controller
2
9
n
Routers
• Maintains map from database.table.key to tablet to storage-unit
• Routes client requests to correct storage unit
• Caches the maps from the tablet controller
• Stores records• Services get/set/delete
requests
From Silberstein’s OSCON 2011 slide
![Page 7: PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall 20111 Some slides/illustrations](https://reader036.vdocuments.mx/reader036/viewer/2022062713/56649ce65503460f949b492c/html5/thumbnails/7.jpg)
EECS 584, Fall 2011 7
Data Retrieval for Hash
Hash space is divided to intervals each corresponds to a tablet
Key -> hash value -> interval -> storage unit found
![Page 8: PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall 20111 Some slides/illustrations](https://reader036.vdocuments.mx/reader036/viewer/2022062713/56649ce65503460f949b492c/html5/thumbnails/8.jpg)
EECS 584, Fall 2011 8
Data Model: HashPrimary Key Record
Grape {"liquid" : "wine"}
Lime {"color" : "green"}
Apple {"quote" : "Apple a day keeps the …"}
Strawberry {"spread" : "jam"}
Orange {"color" : "orange"}
Avocado {"spread" : "guacamole"}
Lemon {"expression" : "expensive crap"}
Tomato {"classification" : "yes… fruit"}
Banana {"expression" : "goes bananas"}
Kiwi {"expression" : "New Zealand"}
Tablet
0x0000
0x911F
0x2AF3
![Page 9: PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall 20111 Some slides/illustrations](https://reader036.vdocuments.mx/reader036/viewer/2022062713/56649ce65503460f949b492c/html5/thumbnails/9.jpg)
EECS 584, Fall 2011 9
Data Retrieval for Ordered Table
Key -> interval -> storage unit found– As opposed to Hash:
Key -> hash value -> interval -> storage unit
![Page 10: PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall 20111 Some slides/illustrations](https://reader036.vdocuments.mx/reader036/viewer/2022062713/56649ce65503460f949b492c/html5/thumbnails/10.jpg)
EECS 584, Fall 2011 10
Data Model: Ordered
Tablet
Primary Key Record
Apple {"quote" : "Apple a day keeps the …"}
Avocado {"spread" : "guacamole"}
Banana {"expression" : "goes bananas"}
Grape {"liquid" : "wine"}
Kiwi {"expression" : "New Zealand"}
Lemon {"expression" : "expensive crap"}
Lime {"color" : "green"}
Orange {"color" : "orange"}
Strawberry {"spread" : "jam"}
Tomato {"classification" : "yes… fruit"}
![Page 11: PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall 20111 Some slides/illustrations](https://reader036.vdocuments.mx/reader036/viewer/2022062713/56649ce65503460f949b492c/html5/thumbnails/11.jpg)
EECS 584, Fall 2011 11
Tablet Splitting & BalancingEach storage unit has many tablets (horizontal partitions of the table)
Tablets may grow over timeOverfull tablets split
Storage unit may become a hotspot
Shed load by moving tablets to other servers
From Silberstein’s OSCON 2011 slide
![Page 12: PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall 20111 Some slides/illustrations](https://reader036.vdocuments.mx/reader036/viewer/2022062713/56649ce65503460f949b492c/html5/thumbnails/12.jpg)
EECS 584, Fall 2011 12
Data Model: Limitations
No constraints– Is a challenge to the asynchronous system
No ad hoc queries (joins, group-by, …)– Is a challenge while maintaining response-time SLA
MapReduce Interface– Hadoop: MapReduce implementation for large scale data
analysis– Pig: the language
![Page 13: PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall 20111 Some slides/illustrations](https://reader036.vdocuments.mx/reader036/viewer/2022062713/56649ce65503460f949b492c/html5/thumbnails/13.jpg)
EECS 584, Fall 2011 13
Yahoo! Message Broker
Pub/sub (publish/subscribe)– Clients subscribe to updates– Updates done by a client are propagated to other clients– Thus, the subscribing clients are notified of the updates
Usages– Updates: updates are “committed” only when they are
published to YMB– Notifications
![Page 14: PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall 20111 Some slides/illustrations](https://reader036.vdocuments.mx/reader036/viewer/2022062713/56649ce65503460f949b492c/html5/thumbnails/14.jpg)
EECS 584, Fall 2011 14
Geographical Replication
Same copies of a “single region” node across data centers– Why?
• Reduced latency for regions distant from the “master” region• Backup
– But...more latency for updates On updates, propagate the changes to all replicas
– This is done using Yahoo! Message Broker– This is one where the latency comes from
![Page 15: PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall 20111 Some slides/illustrations](https://reader036.vdocuments.mx/reader036/viewer/2022062713/56649ce65503460f949b492c/html5/thumbnails/15.jpg)
PNUTS Multi-Region
StorageUnits
DC1
Applications
Tribble (Message Bus)
DC3
Messaging Layer
Tablet 1
Tablet 2
Tablet 3
Tablet 4
Tablet 5
Tablet M
Table XYZ
1
3
5
Tablet Controller
2
9
n
Filer
VIP
Key JSON
1
Key JSON
Key JSON
Key JSON
2
Key JSON
Key JSON
Key JSON
n
Key JSON
Key JSON
Routers
VIP
Key JSON
1
Key JSON
Key JSON
Key JSON
2
Key JSON
Key JSON
Key JSON
m
Key JSON
Key JSON
Routers
VIP
Key JSON
1
Key JSON
Key JSON
Key JSON
2
Key JSON
Key JSON
Key JSON
k
Key JSON
Key JSON
Routers
Tribble (Message Bus)
DC2
Tablet Controller
Tablet Controller
From Silberstein’s OSCON 2011 slide
![Page 16: PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall 20111 Some slides/illustrations](https://reader036.vdocuments.mx/reader036/viewer/2022062713/56649ce65503460f949b492c/html5/thumbnails/16.jpg)
EECS 584, Fall 2011 16
Consistency Model Per-record Timeline Consistency
![Page 17: PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall 20111 Some slides/illustrations](https://reader036.vdocuments.mx/reader036/viewer/2022062713/56649ce65503460f949b492c/html5/thumbnails/17.jpg)
EECS 584, Fall 2011 17
Record-level Mastering
Assigns one cluster is the “master” for each record Local update -> record master for complete “commit” Choosing the right master
– 85% of writes are typically done from the same datacenter– Therefore, choose the “frequent writer” datacenter as the
master– If the writes moves, reassign the master accordingly
Tablet master (not record master) enforces primary key constraints
![Page 18: PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall 20111 Some slides/illustrations](https://reader036.vdocuments.mx/reader036/viewer/2022062713/56649ce65503460f949b492c/html5/thumbnails/18.jpg)
EECS 584, Fall 2011 18
Recovery
Find the replica to copy from Checkpoint message for in-flight updates Copy lost tablets from the chosen replica Which replica to choose?
– A nearby “backup region” can efficiently accommodate this need
![Page 19: PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall 20111 Some slides/illustrations](https://reader036.vdocuments.mx/reader036/viewer/2022062713/56649ce65503460f949b492c/html5/thumbnails/19.jpg)
Asynchronous ReplicationFrom Silberstein’s OSCON 2011 slide
![Page 20: PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall 20111 Some slides/illustrations](https://reader036.vdocuments.mx/reader036/viewer/2022062713/56649ce65503460f949b492c/html5/thumbnails/20.jpg)
EECS 584, Fall 2011 20
Experiments Metric: latency Being compared: hash and ordered tables Clusters: three-region PNUTS cluster
– 2 to the west, 1 to the east Parameters
![Page 21: PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall 20111 Some slides/illustrations](https://reader036.vdocuments.mx/reader036/viewer/2022062713/56649ce65503460f949b492c/html5/thumbnails/21.jpg)
EECS 584, Fall 2011 21
Inserting Data
One region (West 1) is the tablet master Hash: 99 clients (33 per region), MySQL: 60 clients 1 million records, 1/3 per region Result:
– Hash: West1: 75.6ms; West2: 131.5ms, East 315.5ms– Ordered: West1: 33ms; West2: 105.8ms, East 324.5ms
Lesson: MySQL is faster than hash, although more vulnerable to contention
More observations
![Page 22: PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall 20111 Some slides/illustrations](https://reader036.vdocuments.mx/reader036/viewer/2022062713/56649ce65503460f949b492c/html5/thumbnails/22.jpg)
EECS 584, Fall 2011 22
Varying Load
Requests vary between 1200 – 3600 requests/second with 10% writes
Result:
Observation or Anomaly?
![Page 23: PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall 20111 Some slides/illustrations](https://reader036.vdocuments.mx/reader036/viewer/2022062713/56649ce65503460f949b492c/html5/thumbnails/23.jpg)
EECS 584, Fall 2011 23
Varying Read/Write Ratio
Ratios vary between 0 and 50% Fixed 1,200 requests/second
![Page 24: PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall 20111 Some slides/illustrations](https://reader036.vdocuments.mx/reader036/viewer/2022062713/56649ce65503460f949b492c/html5/thumbnails/24.jpg)
EECS 584, Fall 2011 24
Varying Number of Storage Units
Storage units per region vary from 2-5 10% writes, 1,200 requests/seconds
![Page 25: PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall 20111 Some slides/illustrations](https://reader036.vdocuments.mx/reader036/viewer/2022062713/56649ce65503460f949b492c/html5/thumbnails/25.jpg)
EECS 584, Fall 2011 25
Varying Size of Range Scans Range scan between 0.01 to 0.1% size Ordered table only 30 clients vs. 300 clients
![Page 26: PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall 20111 Some slides/illustrations](https://reader036.vdocuments.mx/reader036/viewer/2022062713/56649ce65503460f949b492c/html5/thumbnails/26.jpg)
EECS 584, Fall 2011 26
Thank you
Questions?