large-scale data and computation - fudan...

Post on 27-Jun-2020

21 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Large-Scale Data and Computation

Yifu Huang

School of Computer Science, Fudan Universityhuangyifu@fudan.edu.cn

COMP620003 Advanced Computer Networks Report, 2013

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 1 / 27

Outline

1 Motivation

2 GFS

3 MapReduce

4 Bigtable

5 Discussion

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 2 / 27

Motivation

Motivation

Why we choose these papers?

Big dataGoogle

Why we need GFS, MapReduce, Bigtable?

A scalable distributed file system for large distributed data-intensiveapplicationsA programming model for processing and generating large datasets thatis amenable to a broad variety of real-world tasksA distributed storage system for managing structured data that isdesigned to scale to a very large size

What can we learn from these papers?

The design and implementation ideas behind GFS, MapReduce,BigtableGet ready to enjoy open source equivalents HDFS, YARN, HBase

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 3 / 27

Motivation

Motivation

Why we choose these papers?

Big dataGoogle

Why we need GFS, MapReduce, Bigtable?

A scalable distributed file system for large distributed data-intensiveapplicationsA programming model for processing and generating large datasets thatis amenable to a broad variety of real-world tasksA distributed storage system for managing structured data that isdesigned to scale to a very large size

What can we learn from these papers?

The design and implementation ideas behind GFS, MapReduce,BigtableGet ready to enjoy open source equivalents HDFS, YARN, HBase

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 3 / 27

Motivation

Motivation

Why we choose these papers?

Big dataGoogle

Why we need GFS, MapReduce, Bigtable?

A scalable distributed file system for large distributed data-intensiveapplicationsA programming model for processing and generating large datasets thatis amenable to a broad variety of real-world tasksA distributed storage system for managing structured data that isdesigned to scale to a very large size

What can we learn from these papers?

The design and implementation ideas behind GFS, MapReduce,BigtableGet ready to enjoy open source equivalents HDFS, YARN, HBase

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 3 / 27

Motivation

Ecosystem

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 4 / 27

GFS

The Google File System

The Google File System

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Google, Inc.

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 5 / 27

GFS

Design Overview

The system is built from many inexpensive commodity componentsthat often failThe system stores a modest number of large filesThe workloads primarily consist of two kinds of reads: large streamingreads and small random readsThe workloads also have many large, sequential writes that appenddata to filesThe system must efficiently implement well-defined semantics ormultiple clients that concurrently append to the same fileHigh sustained bandwidth is more important than low latency

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 6 / 27

GFS

Design Overview

The system is built from many inexpensive commodity componentsthat often failThe system stores a modest number of large filesThe workloads primarily consist of two kinds of reads: large streamingreads and small random readsThe workloads also have many large, sequential writes that appenddata to filesThe system must efficiently implement well-defined semantics ormultiple clients that concurrently append to the same fileHigh sustained bandwidth is more important than low latency

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 6 / 27

GFS

Design Overview

The system is built from many inexpensive commodity componentsthat often failThe system stores a modest number of large filesThe workloads primarily consist of two kinds of reads: large streamingreads and small random readsThe workloads also have many large, sequential writes that appenddata to filesThe system must efficiently implement well-defined semantics ormultiple clients that concurrently append to the same fileHigh sustained bandwidth is more important than low latency

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 6 / 27

GFS

Design Overview

The system is built from many inexpensive commodity componentsthat often failThe system stores a modest number of large filesThe workloads primarily consist of two kinds of reads: large streamingreads and small random readsThe workloads also have many large, sequential writes that appenddata to filesThe system must efficiently implement well-defined semantics ormultiple clients that concurrently append to the same fileHigh sustained bandwidth is more important than low latency

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 6 / 27

GFS

Design Overview

The system is built from many inexpensive commodity componentsthat often failThe system stores a modest number of large filesThe workloads primarily consist of two kinds of reads: large streamingreads and small random readsThe workloads also have many large, sequential writes that appenddata to filesThe system must efficiently implement well-defined semantics ormultiple clients that concurrently append to the same fileHigh sustained bandwidth is more important than low latency

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 6 / 27

GFS

Design Overview

The system is built from many inexpensive commodity componentsthat often failThe system stores a modest number of large filesThe workloads primarily consist of two kinds of reads: large streamingreads and small random readsThe workloads also have many large, sequential writes that appenddata to filesThe system must efficiently implement well-defined semantics ormultiple clients that concurrently append to the same fileHigh sustained bandwidth is more important than low latency

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 6 / 27

GFS

Design Overview (cont.)

Architecture

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 7 / 27

GFS

Design Overview (cont.)

Chunk Size: 64 MBMetadata

The file and chunk namespacesThe mapping from files to chunksThe locations of each chunk’s replicas

Chunk LocationsOperation LogConsistency Model

Guarantees by GFSImplications for Applications

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 8 / 27

GFS

Design Overview (cont.)

Chunk Size: 64 MBMetadata

The file and chunk namespacesThe mapping from files to chunksThe locations of each chunk’s replicas

Chunk LocationsOperation LogConsistency Model

Guarantees by GFSImplications for Applications

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 8 / 27

GFS

Design Overview (cont.)

Chunk Size: 64 MBMetadata

The file and chunk namespacesThe mapping from files to chunksThe locations of each chunk’s replicas

Chunk LocationsOperation LogConsistency Model

Guarantees by GFSImplications for Applications

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 8 / 27

GFS

Design Overview (cont.)

Chunk Size: 64 MBMetadata

The file and chunk namespacesThe mapping from files to chunksThe locations of each chunk’s replicas

Chunk LocationsOperation LogConsistency Model

Guarantees by GFSImplications for Applications

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 8 / 27

GFS

Design Overview (cont.)

Chunk Size: 64 MBMetadata

The file and chunk namespacesThe mapping from files to chunksThe locations of each chunk’s replicas

Chunk LocationsOperation LogConsistency Model

Guarantees by GFSImplications for Applications

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 8 / 27

GFS

System Interactions

Leases and Mutation Order

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 9 / 27

GFS

System Interactions (cont.)

Data Flow

We decouple the flow of data from the flow of control to use thenetwork efficientlyOur goals are to fully utilize each machine’s network bandwidth, avoidnetwork bottlenecks and high-latency links, and minimize the latency topush through all the data

Atomic Record Appends

GFS provides an atomic append operation called record append

Snapshot

The snapshot operation makes a copy of a file or a directory treealmost instantaneously, while minimizing any interruptions of ongoingmutations

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 10 / 27

GFS

System Interactions (cont.)

Data Flow

We decouple the flow of data from the flow of control to use thenetwork efficientlyOur goals are to fully utilize each machine’s network bandwidth, avoidnetwork bottlenecks and high-latency links, and minimize the latency topush through all the data

Atomic Record Appends

GFS provides an atomic append operation called record append

Snapshot

The snapshot operation makes a copy of a file or a directory treealmost instantaneously, while minimizing any interruptions of ongoingmutations

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 10 / 27

GFS

System Interactions (cont.)

Data Flow

We decouple the flow of data from the flow of control to use thenetwork efficientlyOur goals are to fully utilize each machine’s network bandwidth, avoidnetwork bottlenecks and high-latency links, and minimize the latency topush through all the data

Atomic Record Appends

GFS provides an atomic append operation called record append

Snapshot

The snapshot operation makes a copy of a file or a directory treealmost instantaneously, while minimizing any interruptions of ongoingmutations

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 10 / 27

GFS

Fault Tolerance And Diagnosis

High Availability

Fast RecoveryChunk ReplicationMaster Replication

Data IntegrityDiagnostic Tools

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 11 / 27

GFS

Fault Tolerance And Diagnosis

High Availability

Fast RecoveryChunk ReplicationMaster Replication

Data IntegrityDiagnostic Tools

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 11 / 27

GFS

Fault Tolerance And Diagnosis

High Availability

Fast RecoveryChunk ReplicationMaster Replication

Data IntegrityDiagnostic Tools

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 11 / 27

MapReduce

MapReduce: Simplied Data Processing on Large Clusters

MapReduce: Simplied Data Processing on Large Clusters

Jeffrey Dean and Sanjay Ghemawat

Google, Inc.

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 12 / 27

MapReduce

Model

Map

Takes an input pair and produces a set of intermediate key/value pairsThe MapReduce library groups together all intermediate valuesassociated with the same intermediate key and passes them to thereduce function

Reduce

Accepts an intermediate key and a set of values for that key andmerges these values together to form a possibly smaller set of values

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 13 / 27

MapReduce

Model

Map

Takes an input pair and produces a set of intermediate key/value pairsThe MapReduce library groups together all intermediate valuesassociated with the same intermediate key and passes them to thereduce function

Reduce

Accepts an intermediate key and a set of values for that key andmerges these values together to form a possibly smaller set of values

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 13 / 27

MapReduce

Implementation

User ProgramMasterWorker (Map/Reduce)Fault toleranceLocalityTask granularityBackup tools

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 14 / 27

MapReduce

Performance

Grep

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 15 / 27

MapReduce

Performance (cont.)

Sort

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 16 / 27

Bigtable

Bigtable: A Distributed Storage System for Structured Data

Bigtable: A Distributed Storage System for Structured Data

Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A.Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber

Google, Inc.

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 17 / 27

Bigtable

Model

Rows

AtomicTablets

Column Families

Family : qualifierAccess control

Timestamps

Decreasing orderGarbage collect

Cell

(row : string, column : string, time : int64) -> string

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 18 / 27

Bigtable

Model

Rows

AtomicTablets

Column Families

Family : qualifierAccess control

Timestamps

Decreasing orderGarbage collect

Cell

(row : string, column : string, time : int64) -> string

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 18 / 27

Bigtable

Model

Rows

AtomicTablets

Column Families

Family : qualifierAccess control

Timestamps

Decreasing orderGarbage collect

Cell

(row : string, column : string, time : int64) -> string

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 18 / 27

Bigtable

Model

Rows

AtomicTablets

Column Families

Family : qualifierAccess control

Timestamps

Decreasing orderGarbage collect

Cell

(row : string, column : string, time : int64) -> string

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 18 / 27

Bigtable

Model (cont.)

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 19 / 27

Bigtable

Storage

Region–Store—memStore—StoreFileFile–BlocksLog–Write ahead log

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 20 / 27

Bigtable

Implementation

Tablet Location

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 21 / 27

Bigtable

Implementation (cont.)

Tablet Serving

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 22 / 27

Bigtable

Evaluation

Benchmarks

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 23 / 27

Bigtable

Evaluation (cont.)

Scaling

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 24 / 27

Bigtable

Evaluation (cont.)

Applications

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 25 / 27

Discussion

Discussion

Contributions of GFS, MapReduce, Bigtable

Supporting large-scale data processing workloads on commodityhardwareEasy to use, hides details of parallelization, fault tolerance, localityoptimization, and load balancingScale well both in terms of data size (from URLs to web pages tosatellite imagery) and latency requirements (from backend bulkprocessing to real-time data serving)

Drawbacks of GFS, MapReduce, Bigtable

Single master may be still a potential bottleneckDisk I/O, time consumingDo not support many database operations (multi-row transaction,secondary index . . . )

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 26 / 27

Discussion

Discussion

Contributions of GFS, MapReduce, Bigtable

Supporting large-scale data processing workloads on commodityhardwareEasy to use, hides details of parallelization, fault tolerance, localityoptimization, and load balancingScale well both in terms of data size (from URLs to web pages tosatellite imagery) and latency requirements (from backend bulkprocessing to real-time data serving)

Drawbacks of GFS, MapReduce, Bigtable

Single master may be still a potential bottleneckDisk I/O, time consumingDo not support many database operations (multi-row transaction,secondary index . . . )

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 26 / 27

Appendix

References I

[1] The Google File System. SOSP. 2003.

[2] MapReduce: Simplied Data Processing on Large Clusters. OSDI.2004.

[3] Bigtable: A Distributed Storage System for Structured Data.OSDI. 2006.

[4] The Hadoop Distributed File System. MSST. 2010.

[5] Hadoop: the definitive guide. 2012.

[6] HBase: the definitive guide. 2011.

[7] Data-intensive text processing with MapReduce. 2010.

[8] The Chubby lock service for loosely-coupled distributed systems.OSDI. 2006.

Yifu Huang (FDU CS) COMP620003 Report 2013/12/5 27 / 27

top related