programming support and adaptive checkpointing for high-throughput data services with log-based...
TRANSCRIPT
PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY
DSN 2010
Jingyu ZhouShanghai Jiao Tong
Univ
Caijie ZhangGoogle Inc.
Hong TangYahoo!
Jiesheng WuMicrosoft
Tao YangUC Santa Barbara
Backgrounds
Many large-scale data-mining and offline applications in Google, Yahoo, Microsoft, Ask.com, etc. require High data parallelism/throughput Data persistence. But not so stringent
availability E.g., URL property service (UPS) at
Ask.com search offline mining platform Hundreds of app. modules access UPS
Examples of high-throughput data services for web mining/search
Internet Web documents
CrawlerCrawlerCrawler
Data mining job
Document DB
Document DB
Document DB
Data mining job
Data mining job…
Data/infoservice
Data/ infoservice
10-50 billion URLs
e.g. URL propertyservice. 100K-500K/s
Existing Approaches for High-performance and Persistence
Database systems suffer from high overhead, limits its
performance while supporting general features
Need more machine resources Related work and well-known techniques
for high availability Data replication Log-based recovery Checkpointing
Challenges and Focus of this work
System design with careful selection and integration of fault-tolerant techniques for high throughput computing. Trade off in availability, but allow some down
time. Low cost: logging/checkpoint.
Fine-grain for minimum service disruption. Local data recovery. Periodic remote backup.
Programming support Lightweight, simplifying construction of robust
data services.
SLACH: Selective Logging & Adaptive CHeckpointing
Targeted data services Request-driven thread model. In-memory objects. Data independence.
Similar to key-value stores in BigTable/Dynamo, but higher throughput.
Architecture of SLACH
Main Techniques
Selective operation logging Only log write operations
(oid, op_type, parameters, timestamp) Write-ahead log, i.e., write then apply operations
Object-level checkpoint to avoid service disruptions with adaptive load control Ckpt objects one-by-one. Still allow concurrent
access of other objects Perform checkpointing when load is low to
amortize cost of checkpointing Light weight API while supporting legacy code.
Object-level Checkpoints
Adaptive Checkpointing Control Goal is to balance ckpt. cost and
recovery speed Ckpt. less frequently-> larger logs ->
lengthy recovery Ckpt. too often -> higher overhead
Ideally, High server load -> ckpt. less frequently Low server load -> ckpt. more frequently
Adjust between a Low Watermark (LW) & High Watermark (HW) of service loads
Loadcurr = α×loadprev+(1-α)×sample
Adaptive Checkpointing Frequency Ckpt. threshold between LB and UB
LB, UB are log size parameters, determined by app.
where
Threshold LB F(load) (UB LB),
SLACH Programming Support Application developers
Call SLACH function log() to log an object operation Define 3 callback functions:
1) what to checkpoint (call SLACH’s ckpt() for each selected object,
2) recover one object from a checkpoint, 3) replay a log operation.
SLACH Provide functions log() and ckpt(). Call user’s checkpoint callback fun during checkpoint. Call a user’s recover function during checkpoint recover. Call a user’s replay function when recovering from a log.
SLACH API for Applications
class SLACH::API {
public:
/* register ckpt. policy and parameters */
void register_policy(const Policy& p);
/* log one write operation */
void log(int64_t obj_id, int op, ...);
/* checkpoint one object */
void ckpt(int64_t obj_id, const void* addr, uint32_t size);
};
SLACH Interface
class SLACH::Application {
…
protected:
/* application checkpoint callback function */
virtual void ckpt_callback()=0;
/* callback of loading one object checkpoint*/
virtual void load_one_callback(int64_t obj_id, const void *addr,uint32_t size)=0;
/* callback of replaying one operation log */
virtual void replay_one_callback(int64_t obj_id, int op, const para_vec& args)=0;
};
An Example: Application-level codestruct Item {
double price;
int quantity;
};
class MyService : public SLACH::Application {
private:
Item obj[1000];
SLACH::API slach_; /* SLACH API */
static const int OP_PRICE=0;/* an op type */
public:
void update_price(int id, double p) {
slach_.log(id, OP_PRICE, &p, sizeof(p));
obj[id].price = p;
}
Application objects being accessed
Log selected object update operation
An Example: Call-back functions
void ckpt_callback() {
for (int i=0; i<1000 ; i++)
slach_.ckpt(i, &obj[i], sizeof(obj[i]));
}
void load_one_callback(int64_t id, const void *p, uint32_t size) {
memcpy(&obj[id], p, size);
}
void replay_one_callback(int64_t id, int op, const para_vec& args) {
switch (op) {
case OP_PRICE:
obj[id].price = *(double*)args[0].second;
break;
// ...
}
}
};
SLACH calls this user function during checkpointing.
SLACH calls this when recovering an object from a checkpoint .
SLACH calls this when recovering an object by
log replaying
SLACH Implementation and Applications
Part of Ask.com middleware infrastructure in C++ for data mining and search offline platform
Application samples: UPS (URL property service) for recording property
of all URLs crawled/collected. HIS (Host information service) for recording
property of all hosts crawled on the web. 20-80% of write traffic. Running on a cluster of
hundreds of machines. In production for last 3 years.
Significantly reduced development time (1-2 months vs. few days).
Characteristics of UPS/HIS
Perfor. characteristics of UPS/HIS per partition.
Parameters for adaptive ckpt. Control
Data Max. Read
Max. Write
UPS 1.9GB 110K Req/s 56K Req/s
HIS 2.1GB 58K Req/s 16K Req/s
UPS HIS
α Moving avg.
0.8 0.8
LB/UB low/upper b.
1M-8M entries
0.3M-1.8M
LW/HW L/H watermark
20%-85% 35%-85%
β Scaling 3 6
w Sampling win.
5s 5s
Evaluation
Impact of logging overhead System behavior during checkpointing Effectiveness of adaptive checkpoint
control Performance comparison of hash table
implementation using SLACH and BerkeleyDB
Evaluation Setting
Benchmarks UPS (URL property service) HIS (Host-level property service) Persistent Hash Table (PHT)
Metric: throughput loss percent
Hardware: a 15 node cluster, gigabit link
LossPercent (1SuccessfulRequests
TotalRequests) 100
Selective Logging Overhead of UPS
• Base: logging is disabled
• Log: selective logging is enabled
Negligible impact whenserver load < 40%.
System Performance During Checkpointing (100% server load)
During ckpt, 8.9% throughput drop
During ckpt, 57.6% increase of response time
Effectiveness of Adaptive Threshold Controller – Performance Comparison in UPS
• Fixed threshold policy, 8M has lower runtime overhead – less frequent ckpt• Adaptive approach has comparable performance as fixed policy of 8M.
Effectiveness of Threshold Controller – Recovery Speed
• Fixed threshold -> fixed log size -> same recovery time• Adaptive approach: small log for light load (less recovery time), large log for higher load (more recovery time)
PHT vs. Berkeley DB
30-B value, SLACH is 5.3 times higher
SLACH is better for all value sizes, because1. BDB incurs more per-operation
overhead2. BDB involves more disk I/Os
SLACH ckpt has less overhead1. BDB ckpt is not async2. SLACH fuzzy ckpt still allow
access
Conclusions
SLACH contributions A lightweight programming framework for very high-
throughput, persistent data services Simplify application construction while meeting reliability
demands Selective logging to enhance performance
System design with careful integration of multiple techniques Dynamic adjust ckpt. frequency to meet throughput
demands Fine-grained ckpt without service disruptions
Evaluation of integrated scheme in production applications.
Data and Failure Models
Data independence and object-oriented access model Key-value store as in Dynamo/BigTable, but with
much higher throughput demand per machine Each object is a continuous memory block
Middleware infrastructure can handle noncontiguous ones
Fail-stop Focus on local recovery due to app. failures OS/Hardware failure can be dealt with remote ckpt.
Implemented, but not the scope of this paper