big table presentation-final

41
A Distributed Storage System for Structured Data Bigtable Presenter: Yunming Zhang Conglong Li Saturday, September 21, 13

Upload: yunming-zhang

Post on 28-Jan-2015

112 views

Category:

Technology


2 download

DESCRIPTION

A presentation at Rice for COMP 520, distributed systems.

TRANSCRIPT

Page 1: Big table presentation-final

A Distributed Storage System for Structured Data

Bigtable

Presenter:Yunming Zhang

Conglong Li

Saturday, September 21, 13

Page 2: Big table presentation-final

References

SOCC 2010 Key Note SlidesJeff Dean Google

Introduction to Distributed Computing, Winter 2008University of Washington

2Saturday, September 21, 13

Page 3: Big table presentation-final

Motivation

Lots of (semi) structured data at GoogleURLs

Contents, crawl metadata, linksPer-user data:

User preference settings, search resultsScale is large

Billions of URLs, hundreds of million of users,Existing Commercial database doesn’t meet the requirements

3Saturday, September 21, 13

Page 4: Big table presentation-final

Store and manage all the state reliably and efficientlyAllow asynchronous processes to update different pieces of data continuously

Very high read/write ratesEfficient scans over all or interesting subsets of data

Often want to examine data changes over time

Goals

4Saturday, September 21, 13

Page 5: Big table presentation-final

BigTable vs. GFS

GFS provides raw data storageWe need:

More sophisticated storageKey - value mapping

Flexible enough to be usefulStore semi-structured dataReliable, scalable, etc.

5Saturday, September 21, 13

Page 6: Big table presentation-final

BigTable

Bigtable is a distributed storage system for managing large scale structured data

Wide applicabilityScalabilityHigh performanceHigh availability

6Saturday, September 21, 13

Page 7: Big table presentation-final

Overview

Data ModelAPIImplementation StructuresOptimizationsPerformance EvaluationApplicationsConclusions

7Saturday, September 21, 13

Page 8: Big table presentation-final

Data Model

SparseSortedMultidimensional

8Saturday, September 21, 13

Page 9: Big table presentation-final

Cell

Contains multiple versions of the data

Can locate a data using row key, column key and a time stamp

Treats data as uninterpreted array of bytes that allow clients to serialize various forms of structured and semi-structured data

Supports automatic garbage collection per column family for management of versioned data

9Saturday, September 21, 13

Page 10: Big table presentation-final

Store and manage all the state reliably and efficientlyAllow asynchronous processes to update different pieces of data continuously

Very high read/write ratesEfficient scans over all or interesting subsets of data

Often want to examine data changes over time

Goals

10Saturday, September 21, 13

Page 11: Big table presentation-final

Row

Row key is an arbitrary stringAccess to column data in a row is atomic

Row creation is implicit upon storing dataRows ordered lexicographically

Rows close together lexicographically usually reside on one or a small number of machines

11Saturday, September 21, 13

Page 12: Big table presentation-final

Columns

Columns are grouped into Column Families:family:optional_qualifier

Column familyHas associated type informationUsually of the same type 12

Saturday, September 21, 13

Page 13: Big table presentation-final

Overview

Data ModelAPIImplementation StructuresOptimizationsPerformance EvaluationApplicationsConclusions

13Saturday, September 21, 13

Page 14: Big table presentation-final

API

Metadata operationsCreate/delete tables, column families, change metadata, modify access control list

Writes ( atomic )Set (), DeleteCells(), DeleteRow()

ReadsScanner: read arbitrary cells in a BigTable

14Saturday, September 21, 13

Page 15: Big table presentation-final

Overview

Data ModelAPIImplementation StructuresOptimizationsPerformance EvaluationApplicationsConclusions

15Saturday, September 21, 13

Page 16: Big table presentation-final

Tablets

Large tables broken into tablets at row boundariesTablet holds contiguous range of rows

Clients can often choose row keys for localityAim for ~100MB to 200MB of data per tablet

Serving machine responsible for ~100 tabletsFast recovery:

100 machine each pick up 1 tablet from failed machine

Fine-grained load balancing:Migrate tablets away from overloaded machine

16Saturday, September 21, 13

Page 17: Big table presentation-final

Tablets and Splitting

Saturday, September 21, 13

Page 18: Big table presentation-final

System Structure

MasterMetadata operationsLoad balancingKeep track of live tablet serversMaster failure

Tablet serverAccept read and write to data

18Saturday, September 21, 13

Page 19: Big table presentation-final

System Structure

Saturday, September 21, 13

Page 20: Big table presentation-final

System Structure

read/write

Saturday, September 21, 13

Page 21: Big table presentation-final

System Structure

Metadata operations

Saturday, September 21, 13

Page 22: Big table presentation-final

Locating Tablets

3-level hierarchical lookup scheme for tabletsLocation is ip port of servers in META tables

22Saturday, September 21, 13

Page 23: Big table presentation-final

Tablet Representationand serving

Append only tablet logSSTable on GFS

A Sorted map of string to stringIf you want to find a row data, all the data are contiguous

Memtable write bufferWhen a read comes in, you have to merge SSTable data and uncommitted value.

23Saturday, September 21, 13

Page 24: Big table presentation-final

Tablet Representationand Serving

24Saturday, September 21, 13

Page 25: Big table presentation-final

Tablet Representationand Serving

25Saturday, September 21, 13

Page 26: Big table presentation-final

Compaction

Tablet state represented as a set of immutable compacted SSTable files, plus tail of log

Minor compaction:When in-memory buffer fills up, it freezes the in-memory buffer and create a new SSTable

Major compaction:Periodically compact all SSTables for tablet into new base SSTable on GFS

Storage reclaimed from deletions at this point

Produce new tables 26

Saturday, September 21, 13

Page 27: Big table presentation-final

Overview

Data ModelAPIImplementation StructuresOptimizationsPerformance EvaluationApplicationsConclusions

27Saturday, September 21, 13

Page 28: Big table presentation-final

Reliable system for storing and managing all the statesAllow asynchronous processes to update different pieces of data continuously

Very high read/write ratesEfficient scans over all or interesting subsets of data

Often want to examine data changes over time

Goals

28Saturday, September 21, 13

Page 29: Big table presentation-final

Locality Groups

Clients can group multiple column families together into a locality group

A separate SSTable is generated for each locality group

Enable more efficient readCan be declared to be in-memory

29Saturday, September 21, 13

Page 30: Big table presentation-final

Compression

Many opportunities for compressionSimilar values in columns and cells

Within each SSTable for a locality group, encode compressed blocks

Keep blocks small for random access Exploit fact that many values very similar

30Saturday, September 21, 13

Page 31: Big table presentation-final

Reliable system for storing and managing all the statesAllow asynchronous processes to update different pieces of data continuously

Very high read/write ratesEfficient scans over all or interesting subsets of data

Often want to examine data changes over time

Goals

31Saturday, September 21, 13

Page 32: Big table presentation-final

Commit log and recovery

Single commit log file per tablet serverreduce the number of concurrent file writes to GFS

Tablet Recoveryredo points in log perform the same set of operations from last persistent state

32Saturday, September 21, 13

Page 33: Big table presentation-final

Overview

Data ModelAPIImplementation StructuresOptimizationsPerformance EvaluationApplicationsConclusions

33Saturday, September 21, 13

Page 34: Big table presentation-final

Performance evaluation

Test EnvironmentBased on a GFS with 1876 machines400 GB IDE hard drives in each machineTwo-level tree-shaped switched network

Performance TestsRandom Read/WriteSequential Read/Write

34Saturday, September 21, 13

Page 35: Big table presentation-final

Single tablet-server performance

Random reads is the slowestTransfer 64 KB SSTable over GFS to read 1000 byte

Random and sequential writes perform betterAppend writes to server to a single commit logGroup commit

35Saturday, September 21, 13

Page 36: Big table presentation-final

Performance Scaling

Performance didn’t scale linearlyLoad imbalance in multiple server configurationsLarger data transfer overhead

36Saturday, September 21, 13

Page 37: Big table presentation-final

Overview

Data ModelAPIImplementation StructuresOptimizationsPerformance EvaluationApplicationsConclusions

37Saturday, September 21, 13

Page 38: Big table presentation-final

Google Analytics

A service that analyzes traffic patterns at web sitesRaw Click Table

Row for each end-user sessionRow key is (website name, time)

Summary TableExtracts recent session data using MapReduce jobs

38Saturday, September 21, 13

Page 39: Big table presentation-final

Google Earth

Use one table for preprocessing and one for servingDifferent latency requirements (disk vs memory)

Each row in the imagery table represents a single geographic segment

Column family to store data sourceOne column for each raw imageVery sparse

39Saturday, September 21, 13

Page 40: Big table presentation-final

Personalized Search

Row key is a unique useridA column family for each type of user actionReplicated across Bigtable clusters to increase availability and reduce latency

40Saturday, September 21, 13

Page 41: Big table presentation-final

Conclusions

Bigtable provides a high scalability, high performance, high availability and flexible storage for structured data.

It provides a low level read / write based interface for other frameworks to build on top of it

It has enabled Google to deal with large scale data efficiently

41Saturday, September 21, 13