yahoo! research johns hopkins university chris olston anish das sarma xiaodan wang randal burns...
Post on 01-Apr-2015
216 Views
Preview:
TRANSCRIPT
Yahoo! Research
Johns Hopkins University
Chris OlstonAnish Das Sarma
Xiaodan WangRandal Burns
Shared Scan Batch Scheduling in Cloud Computing
Shared Scan Batch Scheduling in Cloud Computing
Project Goals
Eliminate redundant data processing for concurrent workflows that access the same dataset in the Cloud
Batch MapReduce workflows to enable scan sharing– Single pass scan of shared data segments– Alleviate contention and improve scalability– Utilize fewer map/reduce slots under load
Data-intensive workloads (tens of minutes to hours)– Joins across multiple datasets– User specified rewards for early completion
Trade-offs between efficient resource utilization and deadlines
Shared Scan Batch Scheduling in Cloud Computing
Data-Driven Batch Scheduling
Throughput scales with contention (Astro. & Turbulence) Decompose into sub-queries based on data access Co-schedule sub-queries to amortize I/O Evaluate data atoms based on utility metric
– Reordering based on contention vs. arrival order (CIDR’09)– Adaptive starvation resistance– Job-aware (queries with data dependency) (SC’10)
Turbulence DBTurbulence DB
R1 R2 R3
R2 R3 R4
R1 R2
Q1
Q2
Q3
Dec
omp
osit
ion
Data Access by QueryData Access by Query
Q1 Q2 Q3
Q1 Q3
Q1 Q2
R2
R1
R3Q2R3
Co-schedule by Sub-queryCo-schedule by Sub-query
Bat
ch S
ched
.
QueryResultsQueryResults
Shared Scan Batch Scheduling in Cloud Computing
Application in Cloud Computing Fixed Cloud (fixed resources)
– Single pass scan of shared data– Alleviate contention (utilize less map/reduce slots, shared
loading and shuffling of data)– Earn rewards for early completion (soft deadlines)– Local improvement w/ simulated annealing, greedy ordering
Elastic Cloud– Machine charge = (# of machines) x (# hours)– Speed-up factors w/ more machines (i.e. more parallelism)– Add machines to meet soft deadlines– Aggressive batching to minimize machine charge (efficiency)
Shared Scan Batch Scheduling in Cloud Computing
Sample PigA = load ‘input1' as (a, b, c);
B = filter A by a > 5;
store B into 'output1';
C = group B by b;
store C into 'output2';
Nova Workflow Platform
What is Nova?– Content mgmt and workflow scheduling for the Cloud– Leverages existing resources
Cloud Data: HDFS/Zebra storage Cloud Computing: Oozie, Pig/MR/Hadoop
Users define complex workflows in Oozie that consume the data
Storage: HDFSStorage: HDFS
Processing: Hadoop M-RProcessing: Hadoop M-R
Simple workflow: Oozie
Dataflow: PigDataflow: Pig
Advanced workflow: Nova
App 1App 1 App 2App 2 App 3App 3
OozieWorkflow engine for coordinating Map-Reduce/Pig jobs in Hadoop (i.e. Workflow DAG in which nodes are MR tasks and edges are dataflows)
Shared Scan Batch Scheduling in Cloud Computing
Sample Nova Workflow
cand. entity
extractor
cand. entity
extractor
candidate entity
occurrences
candidate entity
occurrences
(url, entity string)
crawled pages
crawled pages
(url, content)
entitiesentities
(entity id, entity string)
validated entity
occurrences
validated entity
occurrences
(url, entity id)
entity occurrence
counts
entity occurrence
counts
(entity id, count)
crawlercrawler
editorseditors
joinjoingroup-wise
count
group-wise
count
outputoutput
Nova TasksNova Tasks Nova TaskNova Task Nova TaskNova Task
Nova DataNova Data Nova DataNova Data Nova DataNova Data Nova DataNova Data
Nova DataNova Data
Shared Scan Batch Scheduling in Cloud Computing
Shared Scan via Workflow MergingNova Workflow 1Nova Workflow 1
Nova Workflow 2Nova Workflow 2
c2s0c2s0c2s0c2s0
c1s0c1s0c1s0c1s0 c3s0c3s0c3s0c3s0
c4s0c4s0c4s0c4s0
c2s0c2s0c2s0c2s0 c5s0c5s0c5s0c5s0
WorkflowMerger
Nova Workflow 1.2Nova Workflow 1.2(scans c2s0 once)(scans c2s0 once)
c2s0c2s0c2s0c2s0
c1s0c1s0c1s0c1s0 c3s0c3s0c3s0c3s0
c4s0c4s0c4s0c4s0
c5s0c5s0c5s0c5s0
InputData
Pig/MRTasks
OutputData
Sample Use Cases in Nova– Concurrent research, production, maintenance workflows over same data– Content enrichment workflows (i.e. dedup, clustering) over news content– Webmap workflows consuming same URL table
Shared Scan Batch Scheduling in Cloud Computing
Output DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput Data
MapMap11
ReduceReduce11
ShuffleShuffle
Split(Tuple)Split(Tuple)Split(Tuple)Split(Tuple)
NestedNestedPlanPlan
NestedNestedPlanPlanNestedNested
PlanPlanNestedNested
PlanPlanNestedNestedPlanPlan
NestedNestedPlanPlan
Combine(Tuple)Combine(Tuple)Combine(Tuple)Combine(Tuple)
Demux(Tuple)Demux(Tuple)Demux(Tuple)Demux(Tuple)
NestedNestedPlanPlan
NestedNestedPlanPlanNestedNested
PlanPlanNestedNested
PlanPlanNestedNestedPlanPlan
NestedNestedPlanPlan
MapMap22
ReduceReduce22
……
……
Output DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput DataOutput Data
MapMapnn
ReduceReducemm
Input DataInput DataInput DataInput Data Performance ImpactPerformance Impact
(1) (1) Shared LoadingShared Loading
(network, redundant proc.)(network, redundant proc.)
(2) (2) Consolidated computationConsolidated computation
(shared startup/tear down)(shared startup/tear down)
(3) (3) Reducer parallelismReducer parallelism
(Max/Sum # of reducers)(Max/Sum # of reducers)
(1)
(2)
(3)
Shared Scan Batch Scheduling in Cloud Computing
Completion Time by Scheduling Strategy
0
500000
1000000
1500000
2000000
2500000
3000000
1 2 3 4 5 6
# of Shingling Workflows
Tim
e (
ms
)
Sequential-NoMerge
Concurrent-NoMerge
Merged
Performance in Nova for different enrichment workflows (ie. de-dup) on news content (SIGMOD’11)
Shared Scan Batch Scheduling in Cloud Computing
Utilization of Grid Resources (Slot Time)
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
1 2 3 4 5 6 7
# of Shingling Workflows
Slo
t T
ime
(ms)
Concurrent-NoMerge Map
Concurrent-NoMerge Reduce
Merge Map
Merge Reduce
Shared Scan Batch Scheduling in Cloud Computing
PigMix: Load Cost Savings
Shared Scan Batch Scheduling in Cloud Computing
PigMix: Estimating Makespan
Shared Scan Batch Scheduling in Cloud Computing
Ongoing Work Starvation resistance
– Account for heterogeneity in workflow sizes– Provide soft deadline guarantees– Handling cascading failures– Prefer jobs with high load cost (less dilation, high slot time
savings, map-only jobs)
Predicting workflow runtime and frequency– Robustness to inaccuracies in cost estimates– Conserve or expend Cloud resources based on deadline
requirements and system load
Jobs that join/scan multiple input sources
Shared Scan Batch Scheduling in Cloud Computing
Questions?
Shared Scan Batch Scheduling in Cloud Computing
Nova Workflow Platform
Nova features– Abstraction for complex workflows that consume data
Incrementally arriving data (logs, crawls, feeds, ...) Incremental processing of arriving data
– Stateless: shingle every newly-crawled page– Stateful: maintain inlink counts as web grows
Scheduling processing steps– Periodic: run inlink counter once per week
– Triggered: run inlink counter after link extractor – Provides provenance, metadata management, incremental processing
(i.e. joins), data replication, transactional guarantees
Shared Scan Batch Scheduling in Cloud Computing
PigMix: Reducer Parallelism
Shared Scan Batch Scheduling in Cloud Computing
Optimizing for Shared Scan
Define a job J (i.e. MapReduce or Pig)– Scans files f(J) = (F1, …, Fi), scan time per file: s(Fi)
– Fixed processing cost c(J)
d(J) defines a soft deadline of each job– Step: d defined by n pairs of (ti, pi) where 0<ti< ti+1 and pi>pi+1
(a job that completes by ti is award pi points)
– Linearly decay: enforce eventual completion w/ negative pts
Cost of shared scan for Jobs J1 and J2
c(J1) + c(J2) + ∑Fє(f(J1) U f(J2)) s(F) Maximize points and minimize resources
– Local improvement w/ simulated annealing, greedy ordering– Aggressive batching when load is high
Shared Scan Batch Scheduling in Cloud Computing
Performance Evaluation Experimental Setup
– Nova with Shared Scan Module– 200 node Hadoop cluster
128MB HDFS block size 1GB RAM per node 640 mapper and 320 reducer slots
– Shingling workflow (offline content enrichment) De-duplication of news Filter and extract features from content Cluster content by feature and pick one per cluster Execution of multiple de-dup workflows using different clustering alg.
– Scheduling strategies compared Sequential-NoMerge (slower, conserve Grid resources) Concurrent-NoMerge (fast, elastic Grid resources) Merged (fast, conserve Grid resources)
top related