google - bigtable

30
Bigtable : A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C.Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google, Inc. 1

Upload: -

Post on 16-Apr-2017

47 views

Category:

Engineering


0 download

TRANSCRIPT

Page 1: Google - Bigtable

1

Bigtable : A Distributed Storage System for Struc-tured Data

Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C.Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew

Fikes,Robert E. Gruber

Google, Inc.

Page 2: Google - Bigtable

2

IndexIntroductionData ModelAPIBuilding BlocksImplementationRefinementsReal ApplicationsConclusions

Page 3: Google - Bigtable

3

Introduction1. Motivation2. What is a Bigtable?3. Why not a DBMS?

Page 4: Google - Bigtable

4

Introduction : MotivationLot of structured data at Google

◦Web page, Geographic Info. , User data, Mail

Millions of machinesDifferent projects/applications

Page 5: Google - Bigtable

5

Introduction : Why not a DBMS?Provide more than Google needsRequired DB with wide scalability,

wide applicability, high perfor-mance and high availability

Low-level storage optimizations help performance significantly

Cost would be very high◦Most DBMSs require very expensive

infrastructure

Page 6: Google - Bigtable

6

Introduction : What is a Bigtable?Bigtable is a distributed storage system

for managing structured dataAchieved several goals

◦wide applicability, scalability, high perfor-mance

Scalable◦ Terabytes of in-memory data◦ Petabyte of disk-based data◦ Millions of reads/writes per second, efficient scans

Self-managing◦Servers can be added/removed dynamically◦Servers adjust to load imbalance

Page 7: Google - Bigtable

7

Data Model1. Row2. Column families3. Timestamps

Page 8: Google - Bigtable

8

Data Model : RowThe row keys in a table are arbi-

trary stringsData is maintained lexicographic

older by row keyRow range is called a “tablet”,

which is the unit of distribution and load balancing

Sorted by row key in tablet

Page 9: Google - Bigtable

9

Data Model : Column Fam-iliesColumn keys are grouped into

sets called “column families”Basic unit of access controlA column key is named using the

this syntax “ family:qualifier”Access control and disk/memory ac-

counting are performed at the col-umns-family level

Page 10: Google - Bigtable

10

Data Model : TimestampsEach cell in a Bigtable can con-

tain multiple versions of the same data

sorted by timestamp order by descending

64-bit integersreal time in microseconds or as-

signed by client application

Page 11: Google - Bigtable

11

Data Model : Example

Row

Columns Columns family

Timestamps

Page 12: Google - Bigtable

12

APIThe Bigtable API provieds functions

◦Create/delete table and column families◦Change table, column family metadata◦Look up values from individual rows◦Iterate over a subset of the data

Supports single-row trancsactionsCan be used with

MapReduce(HBase)

Page 13: Google - Bigtable

13

API : ExampleUses a Scanner to iterate over all

anchors in particular rowTable *T = OpenOrDie(“/bigtable/web/webtable”);

Page 14: Google - Bigtable

14

Building BlocksUses the distributed Google File

System(GFS) to store log and data files

A Bigtable cluster typically oper-ates in a shared pool of machines

Depend on cluster management system

The Google SSTable file format is used internally to store Bigtable data

Relies on a highly-available and persistent distributed lock service called Chubby

Page 15: Google - Bigtable

15

Building Blocks : GFS & SSTable & ChubbyGoogle File System:

◦Google File System grew out of an earlier Google effort, "BigFiles”

◦Select for high data throughputs

Page 16: Google - Bigtable

16

Building Blocks : GFS & SSTable & ChubbySSTable:

◦provides a persistent, ordered map from keys to values

◦Contains a sequence of index block

Page 17: Google - Bigtable

17

Building Blocks : GFS & SSTable & ChubbyChubby:

◦ensure that there is at most one ac-tive master at any time

◦store the bootstrap location of Bigtable data

◦discover tablet servers and finalize tablet server deaths

◦store Bigtable schema information (the column family information for each table)

Page 18: Google - Bigtable

18

Implementation1. Tablet Location2. Tablet Assignment3. Tablet Serving

Page 19: Google - Bigtable

19

ImplementationThree major components

◦Library that is linked every client◦One master server◦Many tablet servers

Page 20: Google - Bigtable

20

Implementation : Tablet LocationUse three-level hierarchy analogous to

that of a B+tree to store tablet loca-tion information(Maximum three level)

The first level is a file stored in Chubby that contains the location of the root tablet

Page 21: Google - Bigtable

21

Implementation : Tablet LocationRoot tablet

◦First tablet in the METADATA table◦Never split to ensure that the tablet

location hierarchy has no more than three levels

METADATA tablet◦Stores the location of a tablet under

a row key that is an encoding of the tablet’s table identifier and its end row

Page 22: Google - Bigtable

Implementation : Tablet Assign-ment

Master server◦assign tablets to tablet servers◦detect presence of absence(expiration) of

tablet servers◦balance tablet-server load◦handle schema changes such as table and

column family creationsTablet server

◦manage a set of tablets(ten to a thousand tablets per tablet server)

◦handle read/write requests to the tablets◦split tablets that have grown too large

Page 23: Google - Bigtable

23

Implementation : Tablet ServingUpdates are committed to a

commit log that stores redo records.

Recently committed ones are store in memtable

Older updates are stored in a se-quence of SSTables

Page 24: Google - Bigtable

24

Refinements1. Locality groups2. Compression3. Caching for read performance4. Bloom filters5. Commit-log implementation

Page 25: Google - Bigtable

25

RefinementsLocality groups

◦Client can group multiple column fami-lies together into a locality group

Compression◦We benefit in that small portions of an

SSTable can be read without decom-pressing the entire file

◦Encode at 100-200MB/s◦Decode at 400-1000MB/s◦10-to-1 reduction in space

Page 26: Google - Bigtable

26

RefinementsCaching for read performance

◦Tablet servers use two levels of caching Scan/Block Cache

Bloom filters◦Should be created for SSTable in a

particular locality groupCommit-log implementation

◦Co-mingling mutations for different tablets in the same physical log file

Page 27: Google - Bigtable

27

Real Applications1. Google Analytics2. Personalized Search

Page 28: Google - Bigtable

28

Real ApplicationsGoogle Analytics

◦Use two of the tables The raw click table(~200TB) The summary table(~20TB)

◦Use a MapReducePersonalized Search

◦History of users◦Use a MapReduce

Page 29: Google - Bigtable

29

ConclusionsBigtable clusters have been in

production use since April 2005 at Google

Provide Performance and high availability

Found that there are significant ad-vantages to building storage solution at Google

Apache Hbase based on Bigtable

Page 30: Google - Bigtable

30

Thank you!