apweb’2011 tutorial

99
APWeb’2011 Tutorial Jiaheng Lu and Sai Renmin Universtiy of China , National University of Singapore Cloud Computing and Scalable Data Management

Upload: kiara-foley

Post on 31-Dec-2015

27 views

Category:

Documents


0 download

DESCRIPTION

Cloud Computing and Scalable Data Management. Jiaheng Lu and Sai Wu Renmin Universtiy of China , National University of Singapore. APWeb’2011 Tutorial. Outline. Cloud computing Map/Reduce, Bigtable and PNUT CAP Theorem and datalog Data indexing in the clouds - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: APWeb’2011 Tutorial

APWeb’2011 Tutorial

Jiaheng Lu and Sai WuRenmin Universtiy of China , National University of Singapore

Cloud Computing and Scalable Data Management

Page 2: APWeb’2011 Tutorial

APWeb 2011

Cloud computing Map/Reduce, Bigtable and PNUT CAP Theorem and datalog Data indexing in the clouds Conclusion and open issues

Outline

Part 1

Part 2

Page 3: APWeb’2011 Tutorial

Cloud computing

Page 4: APWeb’2011 Tutorial
Page 5: APWeb’2011 Tutorial

Why we use cloud computing?

Page 6: APWeb’2011 Tutorial

Why we use cloud computing?

Case 1:Write a fileSaveComputer down, file is lost

Files are always stored in cloud, never lost

Page 7: APWeb’2011 Tutorial

Why we use cloud computing?

Case 2:Use IE --- download, install, useUse QQ --- download, install, useUse C++ --- download, install, use……

Get the serve from the cloud

Page 8: APWeb’2011 Tutorial

What is cloud and cloud computing?

Cloud computing is a style of computing in which dynamically scalable and often virtualized resources are provided as a serve over the Internet.

Users need not have knowledge of, expertise in, or control over the technology infrastructure in the "cloud" that supports them.

Page 9: APWeb’2011 Tutorial

Characteristics of cloud computing

• Virtual. software, databases, Web servers,

operating systems, storage and networking as virtual servers.

• On demand. add and subtract processors, memory,

network bandwidth, storage.

Page 10: APWeb’2011 Tutorial

IaaSInfrastructure as a Service

PaaSPlatform as a Service

SaaSSoftware as a Service

Types of cloud service

Page 11: APWeb’2011 Tutorial
Page 12: APWeb’2011 Tutorial
Page 13: APWeb’2011 Tutorial

APWeb 2011

Cloud computing Map/Reduce, Bigtable and PNUT CAP Theorem and datalog Data indexing in the clouds Conclusion and open issues

Outline

Part 1

Part 2

Page 14: APWeb’2011 Tutorial

Introduction to Introduction to MapReduce MapReduce

Page 15: APWeb’2011 Tutorial

MapReduce Programming ModelMapReduce Programming Model

• Inspired from map and reduce operations commonly

used in functional programming languages like Lisp.

• Users implement interface of two primary methods:–1. Map: (key1, val1) → (key2, val2)–2. Reduce: (key2, [val2]) → [val3]

Page 16: APWeb’2011 Tutorial

Map operationMap operation • Map, a pure function, written by the user, takes an

input key/value pair and produces a set of intermediate key/value pairs. –e.g. (doc—id, doc-content)

• Draw an analogy to SQL, map can be visualized as group-by clause of an aggregate query.

Page 17: APWeb’2011 Tutorial

Reduce operationReduce operation • On completion of map phase, all the

intermediate values for a given output key are combined together into a list and given to a reducer.

• Can be visualized as aggregate function (e.g., average) that is computed over all the rows with the same group-by attribute.

Page 18: APWeb’2011 Tutorial

Pseudo-codePseudo-codemap(String input_key, String input_value): // input_key: document name // input_value: document contents

for each word w in input_value: EmitIntermediate(w, "1");

reduce(String output_key, Iterator intermediate_values): // output_key: a word // output_values: a list of counts

int result = 0; for each v in intermediate_values:

result += ParseInt(v); Emit(AsString(result));

Page 19: APWeb’2011 Tutorial

MapReduce: Execution overviewMapReduce: Execution overview

Page 20: APWeb’2011 Tutorial

MapReduce: ExampleMapReduce: Example

Page 21: APWeb’2011 Tutorial

Research works

• MapReduce is slower than Parallel Databases by a factor 3.1 to 6.5 [1][2]

• By adopting an hybrid architecture, the performance of MapReduce can approach to Parallel Databases [3]

• MapReduce is an efficient tool [4]• Numerous discussions in MapReduce community …

[1] A comparison of approaches to large-scale data analysis. SIGMOD 2009[2] MapReduce and Parallel DBMSs: friends or foes? CACM 2010[3] HadoopDB: an architectural hybrid of MapReduce and DBMS techonologies for analytical workloads. VLDB 2009[4] MapReduce: a flexible data processing tool. CACM 2010

Page 22: APWeb’2011 Tutorial

An in-depth study (1)

• Scheduling• I/O modes• Record Parsing• Indexing

A performance study of MapReduce (Hadoop) on a 100-node cluster of Amazon EC2 with various levels of parallelism. [5]

[5] Dawei Jiang, Beng Chin Ooi, Lei Shi, Sai Wu: The Performance of MapReduce: An In-depth Study. PVLDB 3(1): 472-483 (2010)

Page 23: APWeb’2011 Tutorial

An in-depth study (2)

• By carefully tuning these factors, the overall performance of Hadoop can be comparable to that of parallel database systems.

• Possible to build a cloud data processing system that is both elastically scalable and efficient

[5] Dawei Jiang, Beng Chin Ooi, Lei Shi, Sai Wu: The Performance of MapReduce: An In-depth Study. PVLDB 3(1): 472-483 (2010)

Page 24: APWeb’2011 Tutorial

Osprey: Implementing MapReduce-Style Fault Tolerance in a Shared-

Nothing Distributed Database

Christopher Yang, Christine Yen, Ceryen Tan, Samuel Madden: Osprey: Implementing MapReduce-style fault tolerance in a

shared-nothing distributed database. ICDE 2010:657-668

Page 25: APWeb’2011 Tutorial

Problem proposed

• Problem: Node failures on distributed database system– Faults may be common on large clusters of machines

• Existing solution: Aborting and (possibly) restarting the query– A reasonable approach for short OLTP-style queries– But it’s time-wasting for analytical (OLAP) warehouse

queries

Page 26: APWeb’2011 Tutorial

MapReduce-style fault tolerance for a SQL database

• Break up a SQL query (or program) into smaller, parallelizable subqueries

• Adapt the load balancing strategy of greedy assignment of work

Page 27: APWeb’2011 Tutorial

Osprey

• A middleware implementation of MapReduce-style fault tolerance for a SQL database

Page 28: APWeb’2011 Tutorial

SQL procedureQuery Query Transformer

SubquerySubquerySubquery

PWQ A

Scheduler

Result Merger Result

Execution

SubquerySubquerySubquery

PWQ B

SubquerySubquerySubquery

PWQ C

PWQ :partition work queue

Page 29: APWeb’2011 Tutorial

Mapreduce Online Evaluation Plateform

Page 30: APWeb’2011 Tutorial

Construction

23/4/1930School Of Information Renmin University Of China

Page 31: APWeb’2011 Tutorial

Cloud data management

Page 32: APWeb’2011 Tutorial

Four new principles in Cloud-based data management

Page 33: APWeb’2011 Tutorial

New principle in cloud dta management ( 1 )

• Partition Everything and key-value storage

• 切分万物以治之

•1st normal form cannot be satisfied

Page 34: APWeb’2011 Tutorial

New principle in cloud dta management ( 2 )

• Embrace Inconsistency

• 容不同乃成大同

•ACID properties are not satisfied

Page 35: APWeb’2011 Tutorial

New principle in cloud dta management ( 3 )

• Backup everything with three copies

• 狡兔三窟方高枕

• Guarantee 99.999999% safety

Page 36: APWeb’2011 Tutorial

New principle in cloud dta management ( 4 )

• Scalable and high performance

•运筹沧海量兼容

Page 37: APWeb’2011 Tutorial

Cloud data management

•切分万物以治之•Partition Everything•容不同乃成大同•Embrace Inconsistency•狡兔三窟方高枕•Backup data with three copies•运筹沧海量兼容•Scalable and high performance

Page 38: APWeb’2011 Tutorial

BigTable: A Distributed Storage System for Structured Data

Page 39: APWeb’2011 Tutorial

Introduction

• BigTable is a distributed storage system for managing structured data.

• Designed to scale to a very large size– Petabytes of data across thousands of servers

• Used for many Google projects– Web indexing, Personalized Search, Google Earth, Google

Analytics, Google Finance, …

• Flexible, high-performance solution for all of Google’s products

Page 40: APWeb’2011 Tutorial

Why not just use commercial DB?

• Scale is too large for most commercial databases• Even if it weren’t, cost would be very high

– Building internally means system can be applied across many projects for low incremental cost

• Low-level storage optimizations help performance significantly– Much harder to do when running on top of a database

layer

Page 41: APWeb’2011 Tutorial

Goals

• Want asynchronous processes to be continuously updating different pieces of data– Want access to most current data at any time

• Need to support:– Very high read/write rates (millions of ops per second)– Efficient scans over all or interesting subsets of data– Efficient joins of large one-to-one and one-to-many

datasets• Often want to examine data changes over time

– E.g. Contents of a web page over multiple crawls

Page 42: APWeb’2011 Tutorial

BigTable

• Distributed multi-level map• Fault-tolerant, persistent• Scalable

– Thousands of servers– Terabytes of in-memory data– Petabyte of disk-based data– Millions of reads/writes per second, efficient scans

• Self-managing– Servers can be added/removed dynamically– Servers adjust to load imbalance

Page 43: APWeb’2011 Tutorial

Basic Data Model

• A BigTable is a sparse, distributed persistent multi-dimensional sorted map

(row, column, timestamp) -> cell contents

• Good match for most Google applications

Page 44: APWeb’2011 Tutorial

WebTable Example

• Want to keep copy of a large collection of web pages and related information

• Use URLs as row keys• Various aspects of web page as column names• Store contents of web pages in the contents: column

under the timestamps when they were fetched.

Page 45: APWeb’2011 Tutorial

Rows

• Name is an arbitrary string– Access to data in a row is atomic– Row creation is implicit upon storing data

• Rows ordered lexicographically– Rows close together lexicographically usually on

one or a small number of machines

Page 46: APWeb’2011 Tutorial

Rows (cont.)

Reads of short row ranges are efficient and typically require communication with a small number of machines.

• Can exploit this property by selecting row keys so they get good locality for data access.

• Example: math.gatech.edu, math.uga.edu, phys.gatech.edu, phys.uga.edu

VS

edu.gatech.math, edu.gatech.phys, edu.uga.math, edu.uga.phys

Page 47: APWeb’2011 Tutorial

Columns

• Columns have two-level name structure:• family:optional_qualifier

• Column family– Unit of access control– Has associated type information

• Qualifier gives unbounded columns– Additional levels of indexing, if desired

Page 48: APWeb’2011 Tutorial

Timestamps

• Used to store different versions of data in a cell– New writes default to current time, but timestamps for writes can also be set

explicitly by clients• Lookup options:

– “Return most recent K values”– “Return all values in timestamp range (or all values)”

• Column families can be marked w/ attributes:– “Only retain most recent K values in a cell”– “Keep values until they are older than K seconds”

Page 49: APWeb’2011 Tutorial

Implementation – Three Major Components

• Library linked into every client• One master server

– Responsible for:• Assigning tablets to tablet servers• Detecting addition and expiration of tablet servers• Balancing tablet-server load• Garbage collection

• Many tablet servers– Tablet servers handle read and write requests to its table– Splits tablets that have grown too large

Page 50: APWeb’2011 Tutorial

Tablets

• Large tables broken into tablets at row boundaries– Tablet holds contiguous range of rows

• Clients can often choose row keys to achieve locality– Aim for ~100MB to 200MB of data per tablet

• Serving machine responsible for ~100 tablets– Fast recovery:

• 100 machines each pick up 1 tablet for failed machine– Fine-grained load balancing:

• Migrate tablets away from overloaded machine• Master makes load-balancing decisions

Page 51: APWeb’2011 Tutorial

Refinements: Locality Groups

• Can group multiple column families into a locality group– Separate SSTable is created for each locality group

in each tablet.• Segregating columns families that are not

typically accessed together enables more efficient reads.– In WebTable, page metadata can be in one group

and contents of the page in another group.

Page 52: APWeb’2011 Tutorial

Refinements: Compression

• Many opportunities for compression– Similar values in the same row/column at different

timestamps– Similar values in different columns– Similar values across adjacent rows

• Two-pass custom compressions scheme– First pass: compress long common strings across a large

window– Second pass: look for repetitions in small window

• Speed emphasized, but good space reduction (10-to-1)

Page 53: APWeb’2011 Tutorial

Refinements: Bloom Filters

• Read operation has to read from disk when desired SSTable isn’t in memory

• Reduce number of accesses by specifying a Bloom filter.– Allows us ask if an SSTable might contain data for a

specified row/column pair.– Small amount of memory for Bloom filters drastically

reduces the number of disk seeks for read operations– Use implies that most lookups for non-existent rows or

columns do not need to touch disk

Page 54: APWeb’2011 Tutorial

PNUTS /

SHERPA

To Help You Scale Your Mountains of Data

Page 55: APWeb’2011 Tutorial

Yahoo! Serving Storage Problem

– Small records – 100KB or less

– Structured records – lots of fields, evolving

– Extreme data scale - Tens of TB

– Extreme request scale - Tens of thousands of requests/sec

– Low latency globally - 20+ datacenters worldwide

– High Availability - outages cost $millions

– Variable usage patterns - as applications and users change

55

Page 56: APWeb’2011 Tutorial

E 75656 C

A 42342 EB 42521 W

C 66354 W

D 12352 E

F 15677 E

What is PNUTS/Sherpa?

E 75656 C

A 42342 EB 42521 W

C 66354 W

D 12352 E

F 15677 E

CREATE TABLE Parts (ID VARCHAR,StockNumber INT,Status VARCHAR…

)

CREATE TABLE Parts (ID VARCHAR,StockNumber INT,Status VARCHAR…

)

Parallel databaseParallel database Geographic replicationGeographic replication

Structured, flexible schemaStructured, flexible schema

Hosted, managed infrastructureHosted, managed infrastructure

A 42342 E

B 42521 W

C 66354 W

D 12352 E

E 75656 C

F 15677 E

56

Page 57: APWeb’2011 Tutorial

What Will It Become?

E 75656 C

A 42342 EB 42521 W

C 66354 W

D 12352 E

F 15677 E

E 75656 C

A 42342 EB 42521 W

C 66354 W

D 12352 E

F 15677 E

E 75656 C

A 42342 EB 42521 W

C 66354 W

D 12352 E

F 15677 E

Indexes and viewsIndexes and views

Page 58: APWeb’2011 Tutorial

Technology Elements

PNUTS • Query planning and execution• Index maintenance

Distributed infrastructure for tabular data • Data partitioning • Update consistency• Replication

YDOT FS • Ordered tables

Applications

Tribble• Pub/sub messaging

YDHT FS • Hash tables

Zookeeper• Consistency service

YC

A:

Aut

hori

zati

on

PNUTS API Tabular API

59

Page 59: APWeb’2011 Tutorial

Data Manipulation

• Per-record operations– Get– Set– Delete

• Multi-record operations– Multiget– Scan– Getrange

60

Page 60: APWeb’2011 Tutorial

Tablets—Hash Table

Apple

Lemon

Grape

Orange

Lime

Strawberry

Kiwi

Avocado

Tomato

Banana

Grapes are good to eat

Limes are green

Apple is wisdom

Strawberry shortcake

Arrgh! Don’t get scurvy!

But at what price?

How much did you pay for this lemon?

Is this a vegetable?

New Zealand

The perfect fruit

Name Description Price

$12

$9

$1

$900

$2

$3

$1

$14

$2

$8

0x0000

0xFFFF

0x911F

0x2AF3

61

Page 61: APWeb’2011 Tutorial

Tablets—Ordered Table

62

Apple

Banana

Grape

Orange

Lime

Strawberry

Kiwi

Avocado

Tomato

Lemon

Grapes are good to eat

Limes are green

Apple is wisdom

Strawberry shortcake

Arrgh! Don’t get scurvy!

But at what price?

The perfect fruit

Is this a vegetable?

How much did you pay for this lemon?

New Zealand

$1

$3

$2

$12

$8

$1

$9

$2

$900

$14

Name Description PriceA

Z

Q

H

Page 62: APWeb’2011 Tutorial

Flexible Schema

Posted date Listing id Item Price

6/1/07 424252 Couch $570

6/1/07 763245 Bike $86

6/3/07 211242 Car $1123

6/5/07 421133 Lamp $15

Color

Red

Condition

Good

Fair

Page 63: APWeb’2011 Tutorial

Storageunits

Routers

Tablet Controller

REST API

Clients

Local region Remote regions

Tribble

Detailed Architecture

64

Page 64: APWeb’2011 Tutorial

Tablet Splitting and Balancing

65

Each storage unit has many tablets (horizontal partitions of the table)Each storage unit has many tablets (horizontal partitions of the table)

Tablets may grow over timeTablets may grow over timeOverfull tablets splitOverfull tablets split

Storage unit may become a hotspotStorage unit may become a hotspot

Shed load by moving tablets to other serversShed load by moving tablets to other servers

Storage unitTablet

Page 65: APWeb’2011 Tutorial

QUERY PROCESSING

66

Page 66: APWeb’2011 Tutorial

Accessing Data

67

SUSU SU

1

Get key k

2Get key k3 Record for key k

4 Record for key k

Page 67: APWeb’2011 Tutorial

Bulk Read

68

SUScatter/gather server

SU SU

1

{k1, k2, … kn}

2Get k1

Get k2Get k3

Page 68: APWeb’2011 Tutorial

Storage unit 1 Storage unit 2 Storage unit 3

Range Queries in YDOT

• Clustered, ordered retrieval of records

Storage unit 1Canteloupe

Storage unit 3Lime

Storage unit 2Strawberry

Storage unit 1

Router

AppleAvocadoBananaBlueberry

CanteloupeGrapeKiwiLemon

LimeMangoOrange

StrawberryTomatoWatermelon

AppleAvocadoBananaBlueberry

CanteloupeGrapeKiwiLemon

LimeMangoOrange

StrawberryTomatoWatermelon

Grapefruit…Pear?Grapefruit…Lime?

Lime…Pear?

Storage unit 1Canteloupe

Storage unit 3Lime

Storage unit 2Strawberry

Storage unit 1

Page 69: APWeb’2011 Tutorial

Updates

1

Write key k

2Write key k7

Sequence # for key k

8

Sequence # for key k

SU SU SU

3Write key k

4

5SUCCESS

6Write key k

RoutersMessage brokers

70

Page 70: APWeb’2011 Tutorial

SHERPAIN CONTEXT

71

Page 71: APWeb’2011 Tutorial

Types of Record Stores

• Query expressiveness

Simple Feature rich

Object retrieval

Retrieval from single table of

objects/records

SQL

S3 PNUTS Oracle

Page 72: APWeb’2011 Tutorial

Types of Record Stores

• Consistency model

Best effort Strong guaranteesEventual

consistencyTimeline

consistencyACID

S3 PNUTS Oracle

Program centric

consistency

Program centric

consistencyObject-centric consistency

Object-centric consistency

Page 73: APWeb’2011 Tutorial

Types of Record Stores

• Data model

Flexibility,Schema evolution

Optimized forFixed schemas

CouchDB

PNUTS

Oracle

Consistency spans objectsConsistency

spans objectsObject-centric consistency

Object-centric consistency

Page 74: APWeb’2011 Tutorial

Types of Record Stores

• Elasticity (ability to add resources on demand)

Inelastic Elastic

Limited (via data

distribution)

VLSD(Very Large

Scale Distribution /Replication)

OraclePNUTS

S3

Page 75: APWeb’2011 Tutorial

Application Design Space

Records Files

Get a few things

Scan everything

Sherpa MObStor

Everest Hadoop

YMDBMySQL

Filer

Oracle

BigTable

76

Page 76: APWeb’2011 Tutorial

Alternatives Matrix

Ela

stic

Ope

rabi

lity

Glo

bal l

ow

late

ncy

Ava

ilab

ilit

y

Stru

ctur

ed

acce

ss

Sherpa

Y! UDB

MySQL

Oracle

HDFS

BigTable

DynamoU

pdat

esCassandra

Con

sist

ency

m

odel

SQL

/AC

ID

77

Page 77: APWeb’2011 Tutorial

APWeb 2011

Cloud computing Map/Reduce, Bigtable and PNUT CAP Theorem and datalog Data indexing in the clouds Conclusion and open issues

Outline

Part 1

Part 2

Page 78: APWeb’2011 Tutorial

The CAP Theorem

Consistency

Partition tolerance

Availability

Page 79: APWeb’2011 Tutorial

The CAP Theorem

Once a writer has written, all readers will see that write

Consistency

Partition tolerance

Availability

Page 80: APWeb’2011 Tutorial

The CAP Theorem

System is available during software and hardware upgrades and node failures.

Consistency

Partition tolerance

Availability

Page 81: APWeb’2011 Tutorial

The CAP Theorem

A system can continue to operate in the presence of a network partitions.

Consistency

Partition tolerance

Availability

Page 82: APWeb’2011 Tutorial

The CAP Theorem

Theorem: You can have at most two of these properties for any shared-data system

Consistency

Partition tolerance

Availability

Page 83: APWeb’2011 Tutorial

Consistency

• Two kinds of consistency:– strong consistency – ACID(Atomicity Consistency Isolation

Durability)

– weak consistency – BASE(Basically Available Soft-state Eventual consistency )

Page 84: APWeb’2011 Tutorial

A tailor

3NFTRANSACTION

LOCK ACID

SAFETY

RDBMS

Page 85: APWeb’2011 Tutorial

Datalog

• Main expressive advantage: recursive queries.

• More convenient for analysis: papers look better.

• Without recursion but with negation it is equivalent in power to relational algebra

• Has affected real practice: (e.g., recursion in SQL3, magic sets transformations).

Page 86: APWeb’2011 Tutorial

Datalog

• Example Datalog program:• parent(bill,mary). parent(mary,john).

• ancestor(X,Y) :- parent(X,Y). ancestor(X,Y) :- parent(X,Z),ancestor(Z,Y).

• ?- ancestor(bill,X)

Page 87: APWeb’2011 Tutorial

Joseph’s Conjecture(1)

• CONJECTURE 1. Consistency And Logical Monotonicity (CALM).

• A program has an eventually consistent, coordination-free execution strategy if and only if it is expressible in (monotonic) Datalog.

Page 88: APWeb’2011 Tutorial

Joseph’s Conjecture (2)

• CONJECTURE 2. Causality Required Only for Non-monotonicity (CRON).

• Program semantics require causal message ordering if and only if the messages participate in non-monotonic derivations.

Page 89: APWeb’2011 Tutorial

Joseph’s Conjecture (3)

• CONJECTURE 3. The minimum number of Dedalus timesteps required to evaluate a program on a given input data set is equivalent to the program’s Coordination Complexity.

Page 90: APWeb’2011 Tutorial

Joseph’s Conjecture (4)

• CONJECTURE 4. Any Dedalus program P can be rewritten into an equivalent temporally-minimized program P’ such that each inductive or asynchronous rule of P’ is necessary: converting that rule to a deductive rule would result in a program with no unique minimal model.

Page 91: APWeb’2011 Tutorial

Circumstance has presented a rare opportunity—call it an imperative—for the database community to take its place in the sun, and help create a new environment for parallel and distributed computation to flourish.

------Joseph M. Hellerstein (UC Berkeley)

Page 92: APWeb’2011 Tutorial

Open questions and conclusion

Page 93: APWeb’2011 Tutorial

Open Questions

• What is the right consistency model?• What is the right programming model?• Whether and how to make use of caching?• How to balance functionality and scale?• What are the right cloud abstractions?• Cloud inter-operatability

VLDB 2010 Tutorial

Page 94: APWeb’2011 Tutorial

Concluding

• Data Management for Cloud Computing poses a fundamental challenge to database researchers:– Scalability– Reliability– Data Consistency– Elasticity

• Database community needs to be involved – maintaining status quo will only marginalize our role.

VLDB 2010 Tutorial

Page 95: APWeb’2011 Tutorial

New Textbook “Distributed System and Cloud computing”

《分布式系统与云计算》

96

分布式系统概述 (Introduction to Distributed System)

分布式云计算技术综述 ( Distributed Computing)

分布式云计算平台 (Cloud-based platform)分布式云计算程序开发 (Cloud-based

programming)

Page 96: APWeb’2011 Tutorial

Further Reading

F. Chang et al. Bigtable: A distributed storage system for structured data. In OSDI, 2006. J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, 2004. G. DeCandia et al. Dynamo: Amazon’s highly available key-value store. In SOSP, 2007.

S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google File System. In Proc. SOSP, 2003.

D. Kossmann. The state of the art in distributed query processing. ACM Computing Surveys, 32(4):422–469, 2000.

Page 97: APWeb’2011 Tutorial

Further Reading

Efficient Bulk Insertion into a Distributed Ordered Table (SIGMOD 2008)Adam Silberstein, Brian Cooper, Utkarsh Srivastava, Erik Vee, Ramana Yerneni, Raghu Ramakrishnan

PNUTS: Yahoo!'s Hosted Data Serving Platform (VLDB 2008)Brian Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Phil Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana Yerneni

Asynchronous View Maintenance for VLSD Databases,Parag Agrawal, Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava and Raghu RamakrishnanSIGMOD 2009

Cloud Storage Design in a PNUTShellBrian F. Cooper, Raghu Ramakrishnan, and Utkarsh SrivastavaBeautiful Data, O’Reilly Media, 2009

Page 98: APWeb’2011 Tutorial

Further Reading

F. Chang et al. Bigtable: A distributed storage system for structured data. In OSDI, 2006. J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, 2004. G. DeCandia et al. Dynamo: Amazon’s highly available key-value store. In SOSP, 2007.

S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google File System. In Proc. SOSP, 2003.

D. Kossmann. The state of the art in distributed query processing. ACM Computing Surveys, 32(4):422–469, 2000.

Page 99: APWeb’2011 Tutorial

Thanks 谢谢!