programming support and adaptive checkpointing for high-throughput data services with log-based...

PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY

DSN 2010

Jingyu ZhouShanghai Jiao Tong

Univ

Caijie ZhangGoogle Inc.

Hong TangYahoo!

Jiesheng WuMicrosoft

Tao YangUC Santa Barbara

Backgrounds

Many large-scale data-mining and offline applications in Google, Yahoo, Microsoft, Ask.com, etc. require High data parallelism/throughput Data persistence. But not so stringent

availability E.g., URL property service (UPS) at

Ask.com search offline mining platform Hundreds of app. modules access UPS

Examples of high-throughput data services for web mining/search

Internet Web documents

CrawlerCrawlerCrawler

Data mining job

Document DB

Document DB

Document DB

Data mining job

Data mining job…

Data/infoservice

Data/ infoservice

10-50 billion URLs

e.g. URL propertyservice. 100K-500K/s

Existing Approaches for High-performance and Persistence

Database systems suffer from high overhead, limits its

performance while supporting general features

Need more machine resources Related work and well-known techniques

for high availability Data replication Log-based recovery Checkpointing

Challenges and Focus of this work

System design with careful selection and integration of fault-tolerant techniques for high throughput computing. Trade off in availability, but allow some down

time. Low cost: logging/checkpoint.

Fine-grain for minimum service disruption. Local data recovery. Periodic remote backup.

Programming support Lightweight, simplifying construction of robust

data services.

SLACH: Selective Logging & Adaptive CHeckpointing

Targeted data services Request-driven thread model. In-memory objects. Data independence.

Similar to key-value stores in BigTable/Dynamo, but higher throughput.

Architecture of SLACH

Main Techniques

Selective operation logging Only log write operations

(oid, op_type, parameters, timestamp) Write-ahead log, i.e., write then apply operations

Object-level checkpoint to avoid service disruptions with adaptive load control Ckpt objects one-by-one. Still allow concurrent

access of other objects Perform checkpointing when load is low to

amortize cost of checkpointing Light weight API while supporting legacy code.

Object-level Checkpoints

Adaptive Checkpointing Control Goal is to balance ckpt. cost and

recovery speed Ckpt. less frequently-> larger logs ->

lengthy recovery Ckpt. too often -> higher overhead

Ideally, High server load -> ckpt. less frequently Low server load -> ckpt. more frequently

Adjust between a Low Watermark (LW) & High Watermark (HW) of service loads

Loadcurr = α×loadprev+(1-α)×sample

Adaptive Checkpointing Frequency Ckpt. threshold between LB and UB

LB, UB are log size parameters, determined by app.

where

Threshold LB F(load) (UB LB),

SLACH Programming Support Application developers

Call SLACH function log() to log an object operation Define 3 callback functions:

1) what to checkpoint (call SLACH’s ckpt() for each selected object,

2) recover one object from a checkpoint, 3) replay a log operation.

SLACH Provide functions log() and ckpt(). Call user’s checkpoint callback fun during checkpoint. Call a user’s recover function during checkpoint recover. Call a user’s replay function when recovering from a log.

SLACH API for Applications

class SLACH::API {

public:

/* register ckpt. policy and parameters */

void register_policy(const Policy& p);

/* log one write operation */

void log(int64_t obj_id, int op, ...);

/* checkpoint one object */

void ckpt(int64_t obj_id, const void* addr, uint32_t size);

};

SLACH Interface

class SLACH::Application {

…

protected:

/* application checkpoint callback function */

virtual void ckpt_callback()=0;

/* callback of loading one object checkpoint*/

virtual void load_one_callback(int64_t obj_id, const void *addr,uint32_t size)=0;

/* callback of replaying one operation log */

virtual void replay_one_callback(int64_t obj_id, int op, const para_vec& args)=0;

};

An Example: Application-level codestruct Item {

double price;

int quantity;

};

class MyService : public SLACH::Application {

private:

Item obj[1000];

SLACH::API slach_; /* SLACH API */

static const int OP_PRICE=0;/* an op type */

public:

void update_price(int id, double p) {

slach_.log(id, OP_PRICE, &p, sizeof(p));

obj[id].price = p;

}

Application objects being accessed

Log selected object update operation

An Example: Call-back functions

void ckpt_callback() {

for (int i=0; i<1000 ; i++)

slach_.ckpt(i, &obj[i], sizeof(obj[i]));

}

void load_one_callback(int64_t id, const void *p, uint32_t size) {

memcpy(&obj[id], p, size);

}

void replay_one_callback(int64_t id, int op, const para_vec& args) {

switch (op) {

case OP_PRICE:

obj[id].price = *(double*)args[0].second;

break;

// ...

}

}

};

SLACH calls this user function during checkpointing.

SLACH calls this when recovering an object from a checkpoint .

SLACH calls this when recovering an object by

log replaying

SLACH Implementation and Applications

Part of Ask.com middleware infrastructure in C++ for data mining and search offline platform

Application samples: UPS (URL property service) for recording property

of all URLs crawled/collected. HIS (Host information service) for recording

property of all hosts crawled on the web. 20-80% of write traffic. Running on a cluster of

hundreds of machines. In production for last 3 years.

Significantly reduced development time (1-2 months vs. few days).

Characteristics of UPS/HIS

Perfor. characteristics of UPS/HIS per partition.

Parameters for adaptive ckpt. Control

Data Max. Read

Max. Write

UPS 1.9GB 110K Req/s 56K Req/s

HIS 2.1GB 58K Req/s 16K Req/s

UPS HIS

α Moving avg.

0.8 0.8

LB/UB low/upper b.

1M-8M entries

0.3M-1.8M

LW/HW L/H watermark

20%-85% 35%-85%

β Scaling 3 6

w Sampling win.

5s 5s

Evaluation

Impact of logging overhead System behavior during checkpointing Effectiveness of adaptive checkpoint

control Performance comparison of hash table

implementation using SLACH and BerkeleyDB

Evaluation Setting

Benchmarks UPS (URL property service) HIS (Host-level property service) Persistent Hash Table (PHT)

Metric: throughput loss percent

Hardware: a 15 node cluster, gigabit link

LossPercent (1SuccessfulRequests

TotalRequests) 100

Selective Logging Overhead of UPS

• Base: logging is disabled

• Log: selective logging is enabled

Negligible impact whenserver load < 40%.

System Performance During Checkpointing (100% server load)

During ckpt, 8.9% throughput drop

During ckpt, 57.6% increase of response time

Effectiveness of Adaptive Threshold Controller – Performance Comparison in UPS

• Fixed threshold policy, 8M has lower runtime overhead – less frequent ckpt• Adaptive approach has comparable performance as fixed policy of 8M.

Effectiveness of Threshold Controller – Recovery Speed

• Fixed threshold -> fixed log size -> same recovery time• Adaptive approach: small log for light load (less recovery time), large log for higher load (more recovery time)

PHT vs. Berkeley DB

30-B value, SLACH is 5.3 times higher

SLACH is better for all value sizes, because1. BDB incurs more per-operation

overhead2. BDB involves more disk I/Os

SLACH ckpt has less overhead1. BDB ckpt is not async2. SLACH fuzzy ckpt still allow

access

Conclusions

SLACH contributions A lightweight programming framework for very high-

throughput, persistent data services Simplify application construction while meeting reliability

demands Selective logging to enhance performance

System design with careful integration of multiple techniques Dynamic adjust ckpt. frequency to meet throughput

demands Fine-grained ckpt without service disruptions

Evaluation of integrated scheme in production applications.

Data and Failure Models

Data independence and object-oriented access model Key-value store as in Dynamo/BigTable, but with

much higher throughput demand per machine Each object is a continuous memory block

Middleware infrastructure can handle noncontiguous ones

Fail-stop Focus on local recovery due to app. failures OS/Hardware failure can be dealt with remote ckpt.

Implemented, but not the scope of this paper

programming support and adaptive checkpointing for high-throughput data services with log-based...

Documents

highthroughput data

data independence

largescale datamining

local data recovery

high server load ckpt

sample slide

ks slide

log operation