yahoo! research johns hopkins university chris olston anish das sarma xiaodan wang randal burns...

Yahoo! Research

Johns Hopkins University

Chris OlstonAnish Das Sarma

Xiaodan WangRandal Burns

Shared Scan Batch Scheduling in Cloud Computing

Project Goals

Eliminate redundant data processing for concurrent workflows that access the same dataset in the Cloud

Batch MapReduce workflows to enable scan sharing– Single pass scan of shared data segments– Alleviate contention and improve scalability– Utilize fewer map/reduce slots under load

Data-intensive workloads (tens of minutes to hours)– Joins across multiple datasets– User specified rewards for early completion

Trade-offs between efficient resource utilization and deadlines

Data-Driven Batch Scheduling

Throughput scales with contention (Astro. & Turbulence) Decompose into sub-queries based on data access Co-schedule sub-queries to amortize I/O Evaluate data atoms based on utility metric

– Reordering based on contention vs. arrival order (CIDR’09)– Adaptive starvation resistance– Job-aware (queries with data dependency) (SC’10)

Turbulence DBTurbulence DB

R1 R2 R3

R2 R3 R4

Data Access by QueryData Access by Query

Q1 Q2 Q3

R3Q2R3

Co-schedule by Sub-queryCo-schedule by Sub-query

QueryResultsQueryResults

Application in Cloud Computing Fixed Cloud (fixed resources)

– Single pass scan of shared data– Alleviate contention (utilize less map/reduce slots, shared

loading and shuffling of data)– Earn rewards for early completion (soft deadlines)– Local improvement w/ simulated annealing, greedy ordering

Elastic Cloud– Machine charge = (# of machines) x (# hours)– Speed-up factors w/ more machines (i.e. more parallelism)– Add machines to meet soft deadlines– Aggressive batching to minimize machine charge (efficiency)

Sample PigA = load ‘input1' as (a, b, c);

B = filter A by a > 5;

store B into 'output1';

C = group B by b;

store C into 'output2';

Nova Workflow Platform

What is Nova?– Content mgmt and workflow scheduling for the Cloud– Leverages existing resources

Cloud Data: HDFS/Zebra storage Cloud Computing: Oozie, Pig/MR/Hadoop

Users define complex workflows in Oozie that consume the data

Storage: HDFSStorage: HDFS

Processing: Hadoop M-RProcessing: Hadoop M-R

Simple workflow: Oozie

Dataflow: PigDataflow: Pig

Advanced workflow: Nova

App 1App 1 App 2App 2 App 3App 3

OozieWorkflow engine for coordinating Map-Reduce/Pig jobs in Hadoop (i.e. Workflow DAG in which nodes are MR tasks and edges are dataflows)

Sample Nova Workflow

cand. entity

extractor

cand. entity

extractor

candidate entity

occurrences

candidate entity

occurrences

(url, entity string)

crawled pages

(url, content)

entitiesentities

(entity id, entity string)

validated entity

occurrences

validated entity

occurrences

(url, entity id)

entity occurrence

counts

entity occurrence

counts

(entity id, count)

crawlercrawler

editorseditors

joinjoingroup-wise

group-wise

outputoutput

Nova TasksNova Tasks Nova TaskNova Task Nova TaskNova Task

Nova DataNova Data Nova DataNova Data Nova DataNova Data Nova DataNova Data

Nova DataNova Data

Shared Scan via Workflow MergingNova Workflow 1Nova Workflow 1

Nova Workflow 2Nova Workflow 2

c2s0c2s0c2s0c2s0

c1s0c1s0c1s0c1s0 c3s0c3s0c3s0c3s0

c4s0c4s0c4s0c4s0

WorkflowMerger

Nova Workflow 1.2Nova Workflow 1.2(scans c2s0 once)(scans c2s0 once)

c2s0c2s0c2s0c2s0

c4s0c4s0c4s0c4s0

c5s0c5s0c5s0c5s0

InputData

Pig/MRTasks

OutputData

Sample Use Cases in Nova– Concurrent research, production, maintenance workflows over same data– Content enrichment workflows (i.e. dedup, clustering) over news content– Webmap workflows consuming same URL table

Output DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput Data

MapMap11

ReduceReduce11

ShuffleShuffle

Split(Tuple)Split(Tuple)Split(Tuple)Split(Tuple)

NestedNestedPlanPlan

NestedNestedPlanPlanNestedNested

PlanPlanNestedNested

PlanPlanNestedNestedPlanPlan

Combine(Tuple)Combine(Tuple)Combine(Tuple)Combine(Tuple)

Demux(Tuple)Demux(Tuple)Demux(Tuple)Demux(Tuple)

NestedNestedPlanPlanNestedNested

PlanPlanNestedNested

PlanPlanNestedNestedPlanPlan

MapMap22

ReduceReduce22

……

Output DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput Data

MapMapnn

ReduceReducemm

Input DataInput DataInput DataInput Data Performance ImpactPerformance Impact

(1) (1) Shared LoadingShared Loading

(network, redundant proc.)(network, redundant proc.)

(2) (2) Consolidated computationConsolidated computation

(shared startup/tear down)(shared startup/tear down)

(3) (3) Reducer parallelismReducer parallelism

(Max/Sum # of reducers)(Max/Sum # of reducers)

Completion Time by Scheduling Strategy

500000

1000000

1500000

2000000

2500000

3000000

1 2 3 4 5 6

# of Shingling Workflows

Sequential-NoMerge

Concurrent-NoMerge

Merged

Performance in Nova for different enrichment workflows (ie. de-dup) on news content (SIGMOD’11)

Utilization of Grid Resources (Slot Time)

500000

1000000

1500000

2000000

2500000

3000000

3500000

4000000

1 2 3 4 5 6 7

# of Shingling Workflows

Concurrent-NoMerge Map

Concurrent-NoMerge Reduce

Merge Map

Merge Reduce

PigMix: Load Cost Savings

PigMix: Estimating Makespan

Ongoing Work Starvation resistance

– Account for heterogeneity in workflow sizes– Provide soft deadline guarantees– Handling cascading failures– Prefer jobs with high load cost (less dilation, high slot time

savings, map-only jobs)

Predicting workflow runtime and frequency– Robustness to inaccuracies in cost estimates– Conserve or expend Cloud resources based on deadline

requirements and system load

Jobs that join/scan multiple input sources

Questions?

Nova Workflow Platform

Nova features– Abstraction for complex workflows that consume data

Incrementally arriving data (logs, crawls, feeds, ...) Incremental processing of arriving data

– Stateless: shingle every newly-crawled page– Stateful: maintain inlink counts as web grows

Scheduling processing steps– Periodic: run inlink counter once per week

– Triggered: run inlink counter after link extractor – Provides provenance, metadata management, incremental processing

(i.e. joins), data replication, transactional guarantees

PigMix: Reducer Parallelism

Optimizing for Shared Scan

Define a job J (i.e. MapReduce or Pig)– Scans files f(J) = (F1, …, Fi), scan time per file: s(Fi)

– Fixed processing cost c(J)

d(J) defines a soft deadline of each job– Step: d defined by n pairs of (ti, pi) where 0<ti< ti+1 and pi>pi+1

(a job that completes by ti is award pi points)

– Linearly decay: enforce eventual completion w/ negative pts

Cost of shared scan for Jobs J1 and J2

c(J1) + c(J2) + ∑Fє(f(J1) U f(J2)) s(F) Maximize points and minimize resources

– Local improvement w/ simulated annealing, greedy ordering– Aggressive batching when load is high

Performance Evaluation Experimental Setup

– Nova with Shared Scan Module– 200 node Hadoop cluster

128MB HDFS block size 1GB RAM per node 640 mapper and 320 reducer slots

– Shingling workflow (offline content enrichment) De-duplication of news Filter and extract features from content Cluster content by feature and pick one per cluster Execution of multiple de-dup workflows using different clustering alg.

– Scheduling strategies compared Sequential-NoMerge (slower, conserve Grid resources) Concurrent-NoMerge (fast, elastic Grid resources) Merged (fast, conserve Grid resources)

yahoo! research johns hopkins university chris olston anish das sarma xiaodan wang randal burns...

cloud computing shared

shared scan batch scheduling

cloud computing slide

workflow scheduling

cloud computing pigmix

cloud computing application

cloud computing questions

cloud resources

Documents

inventaris randal

randal j. southard* , stewart s. winters

workshop on networking meets databases (netdb’07)...

presented by randal j. vugteveen, p.s

green public procurement in china - zhang xiaodan

xiaodan lu, ph.d postdoctoral fellow metabolic impact of...

scenttrails: integrating browsing and searching on the...

randal sql sdb407 undocumented

c 2015 xiaodan zhang

randal sagrillo larry mcintosh

draft lakip satker randal edit

johns hopkins university xiaodan wang eric perlman randal...

tugas 2 randal

xiaodan mi 43982279 visual essay

randal vander tuig

chris olston, cs294-7, spring 19991 atomicity in electronic...

randal w. samstag, pe bcee civil and sanitary engineer ·...

marcus fontoura vanja josifovski ravi kumar ...

nailsea school newsletter … · nailsea school newsletter...

randal ford photography