hadoop in sigmod 2011

Hadoop in SIGMOD 2011

2011/5/20

Papers

◦ LCI: a social channel analysis platform for live customer intelligence

◦ Bistro data feed management system◦ Apache hadoop goes realtime at Facebook◦ Nova: continuous Pig/Hadoop workflows◦ A Hadoop based distributed loading approach to

parallel data warehouses◦ A batch of PNUTS: experiences connecting cloud

batch and serving systems

Papers (Continued)

◦ Turbocharging DBMS buffer pool using SSDs◦ Online reorganization in read optimized MMDBS◦ Automated partitioning design in parallel

database systems◦ Oracle database filesystem◦ Emerging trends in the enterprise data analytics:

connecting Hadoop and DB2 warehouse◦ Efficient processing of data warehousing queries

in a split execution environment◦ SQL server column store indexes◦ An analytic data engine for visualization in

tableau

Apache Hadoop Goes Realtime at Facebook

Workload Types

Facebook MessagingHigh Write Throughput

Large Tables

Data Migration

Facebook InsightsRealtime Analytics

High Throughput Increments

Facebook Metrics System (ODS)Automatic Sharding

Fast Reads of Recent Data and Table Scans

Why Hadoop & HBase

Elasticity

High write throughput

Efficient and low-latency strong consistency semantics within a data center

Efficient random reads from disk

High Availability and Disaster Recovery

Fault Isolation

Atomic read-modify-write primitives

Range Scans

Tolerance of network partitions within a single data center

Zero Downtime in case of individual data center failure

Active-active serving capability across different data centers

Realtime HDFS

High Availability - AvatarNodeHot Standby – AvatarNode

Enhancements to HDFS transaction logging

Transparent Failover: DAFS(client enhancement+ZooKeeper)

Hadoop RPC compatibility

Block Availability: Placement Policya pluggable block placement policy

Realtime HDFS (Cont.)

Performance Improvements for a Realtime WorkloadRPC Timeout

Recover File LeaseHDFS-appendrecoverLease

Reads from Local Replicas

New FeaturesHDFS sync

Concurrent Readers (last chunk of data)

Production HBase

ACID Compliance (RWCC: Read Write Consistency Control)Atomicity (WALEdit)

Consistency

Availability ImprovementsHBase Master Rewrite

Region assignment in memory -> ZooKeeper

Online Upgrades

Distributed Log Splitting

Performance ImprovementsCompaction

Read Optimizations

Deployment and Operational Experiences

TestingAuto Tesing Tool

HBase Verify

Monitoring and ToolsHBCK

More metrics

Manual versus Automatic Splitting

Add new RegionServers, not region splitting

Dark Launch (灰度 )

Dashboards/ODS integration

Backups at the Application layer

Schema Changes

Importing DataLzo & zip

Reducing Network IOMajor compaction

Nova: Continuous Pig/Hadoop Workflows

Nova Overview

ScenariosIngesting and analyzing user behavior logs

Building and updating a search index from a stream of crawled web pages

Processing semi-structured data feeds

Two-layer programming model (Nova over Pig)Continuous processing

Independent scheduling

Cross-module optimization

Manageability features

Abstract Workflow Model

WorkflowTwo kinds of vertices: tasks (processing steps) and channels (data containers)

Edges connect tasks to channels and channels to tasks

Edge annotations (all, new, B and Δ)

Four common patterns of processingNon-incremental (template detection)

Stateless incremental (shingling)

Stateless incremental with lookup table (template tagging)

Stateful incremental (de-duping)

Abstract Workflow Model (Cont.)

Data and Update ModelBlocks: base blocks and delta blocks

Channel functions: merge, chain and diff

Task/Data InterfaceConsumption mode: all or new

Production mode: B or Δ

Workflow Programming and Scheduling

Data Compaction and Garbage Collection

Nova System Architecture

Efficient Processing of Data Warehousing Queries in a Split Execution Environment

Introduction

Two approachesStarting with a parallel database system and adding some MapReduce features

Starting with MapReduce and adding database system technology

HadoopDB follows the second of the approaches

Two heuristics for HadoopDB optimizationsDatabase systems can process data at a faster rate than Hadoop.

Minimize the number of MapReduce jobs in SQL execution plan.

HadoopDB

HadoopDB ArchitectureDatabase Connector

Data Loader

Catalog

Query Interface

VectorWise/X100 Database (SIMD) vs. PostgreSQL

HadoopDB Query Executionselection, projection, and partial aggregation(Map and Combine) database system

co-partitioned tables

MR for redistributing data

SideDB (a "database task done on the side").

Split Query Execution

Referential PartitioningJoin in database engine

Local join

foreign-key Referential Partitioning

Split MR/DB JoinsDirected join: one of the tables is already partitioned by the join key.

Broadcast join: small table ought to be shipped to every node.

Adding specialized joins to the MR framework Map-side join.

Tradeoffs: temporary table for join.

Another type of join: MR redistributes data Directed join

Split MR/DB Semijoin like 'foreignKey IN (listOfValues)'Can be split into two MapReduce jobs

SideDB to eliminate the first MapReduce job

Split Query Execution (Cont.)

Post-join AggregationTwo MapReduce jobs

Hash-based partial aggregation save significant I/O

A similar technique is applied to TOP N selections

Pre-join AggregationFor MR based join.

Group-by and join-key columns is smaller than the cardinality of the entire table.

A Query Plan in HadoopDB

Performance No hash partition feature in Hive

Emerging Trends in the Enterprise Data Analytics: Connecting Hadoop and DB2 Warehouse

DB2 and Hadoop/Jaql Interactions

A Hadoop Based Distributed Loading Approach to Parallel Data Warehouses

Introduction

Why Hadoop for Teradata EDWMore disk space and space can be easily added

HDFS as a storage

MapReduce

Distributed

HDFS blocks to Teradata EDW nodes assignment problemParameters: n blocks, k copies, m nodes

Goal: to assign HDFS blocks to nodes evenly and minimize network traffic

Block Assignment ProblemHDFS file F on a cluster of P nodes (each node is uniquely identified with an integer i where 1 ≤ i ≤ P)

The problem is defined by: assignment(X, Y, n,m, k, r)

X is the set of n blocks (X = {1, . . . , n}) of F

Y is the set of m nodes running PDBMS (called PDBMS nodes) (Y⊆ {1, . . . , P })

k copies, m nodes

r is the mapping recording the replicated block locations of each block. r(i) returns the set of nodes which has a copy of the block i.

An assignment g from the blocks in X to the nodes in Y is denoted by a mapping from X = {1, . . . , n} to Y where g(i) = j (i ∈ X, j ∈ Y ) means that the block i is assigned to the node j.

Block Assignment Problem (Cont.)

The problem is defined by: assignment(X, Y, n,m, k, r)

An even assignment g is an assignment such that ∀ i ∈ Y ∀ j ∈ Y | |{ x | 1 ∀≤ x ≤ n&&g(x) = i}| - |{y | 1 ≤ y ≤ n&&g(y) = j}| | ≤ 1. ∀

The cost of an assignment g is defined to be cost(g) = |{i | g(i) /∈ r(i) 1 ≤ i ∀≤ n}|, which is the number of blocks assigned to remote nodes.

We use |g| to denote the number of blocks assigned to local nodes by g. We have |g| = n - cost(g).

The optimal assignment problem is to find an even assignment with the smallest cost.

OBA algorithm(X, Y, n,m, k, r)=({1, 2, 3}, {1, 2}, 3, 2, 1, {1 → {1}, 2 → {1}, 3 → {2}})

hadoop in sigmod 2011

Technology