explicit control in a batch-aware distributed file system

33
Explicit Control in a Batch-aware Distributed File System John Bent Douglas Thain Andrea Arpaci-Dusseau Remzi Arpaci-Dusseau Miron Livny University of Wisconsin, Madison

Upload: quamar-barker

Post on 01-Jan-2016

18 views

Category:

Documents


0 download

DESCRIPTION

Explicit Control in a Batch-aware Distributed File System. John Bent Douglas Thain Andrea Arpaci-Dusseau Remzi Arpaci-Dusseau Miron Livny University of Wisconsin, Madison. Grid computing. Physicists invent distributed computing!. Astronomers develop virtual supercomputers!. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Explicit Control in a  Batch-aware Distributed  File System

Explicit Control in a Batch-aware Distributed File System

John Bent Douglas Thain

Andrea Arpaci-Dusseau Remzi Arpaci-Dusseau

Miron LivnyUniversity of Wisconsin, Madison

Page 2: Explicit Control in a  Batch-aware Distributed  File System

Grid computing

Physicists invent distributed computing!Astronomers develop virtual supercomputers!

Page 3: Explicit Control in a  Batch-aware Distributed  File System

Grid computing

Home storage

Internet

If it looks like a duck . . .

Page 4: Explicit Control in a  Batch-aware Distributed  File System

Are existing distributed file systems adequate for batch computing

workloads?• NO. Internal decisions inappropriate

• Caching, consistency, replication

• A solution: Batch-Aware Distributed File System (BAD-FS)• Combines knowledge with external storage control

• Detail information about workload is known• Storage layer allows external control• External scheduler makes informed storage decisions

• Combining information and control results in• Improved performance• More robust failure handling• Simplified implementation

Page 5: Explicit Control in a  Batch-aware Distributed  File System

Outline

• Introduction• Batch computing

• Systems• Workloads• Environment• Why not DFS?

• Our answer: BAD-FS• Design• Experimental evaluation

• Conclusion

Page 6: Explicit Control in a  Batch-aware Distributed  File System

Batch computing• Not interactive computing• Job description languages• Users submit• System itself executes• Many different batch systems

• Condor• LSF• PBS• Sun Grid Engine

Page 7: Explicit Control in a  Batch-aware Distributed  File System

Internet

Batch computing

Scheduler

Compute node

CPUManager

Compute node

CPUManager

Compute node

CPUManager

Compute node

CPUManager

Job queue

1 2

3 4Home storage

1 2

3 4

Page 8: Explicit Control in a  Batch-aware Distributed  File System

Batch workloads

• General properties• Large number of processes• Process and data dependencies• I/O intensive

• Different types of I/O• Endpoint• Batch• Pipeline

• Our focus: Scientific workloads• More generally applicable• Many others use batch computing

• video production, data mining, electronic design, financial services, graphic rendering

““Pipeline and Batch Sharing in Grid Pipeline and Batch Sharing in Grid Workloads,”Workloads,” Douglas Thain, John Bent, Douglas Thain, John Bent, Andrea Arpaci-Dusseau, Remzi Arpaci-Andrea Arpaci-Dusseau, Remzi Arpaci-Dussea, Miron Livny. HPDC 12, 2003.Dussea, Miron Livny. HPDC 12, 2003.

Page 9: Explicit Control in a  Batch-aware Distributed  File System

Batch workloads

Endpoint

Endpoint

EndpointBatch

dataset

Batch dataset

Pipeline

Pip

elin

e

Endpoint Endpoint

EndpointEndpointEndpointEndpoint

Pipeline Pipeline

Pipeline Pipeline Pipeline

PipelinePipeline

Page 10: Explicit Control in a  Batch-aware Distributed  File System

Cluster-to-cluster (c2c)

• Not quite p2p• More organized• Less hostile• More homogeneity• Correlated failures

• Each cluster is autonomous• Run and managed by different entities

• An obvious bottleneck is wide-area

How to manage flow of data into, within and out of these clusters?

InternetHomestore

Page 11: Explicit Control in a  Batch-aware Distributed  File System

Why not DFS?

• Distributed file system would be ideal• Easy to use• Uniform name space• Designed for wide-area networks

• But . . . • Not practical• Embedded decisions are wrong

InternetHomestore

Page 12: Explicit Control in a  Batch-aware Distributed  File System

DFS’s make bad decisions

• Caching • Must guess what and how to cache

• Consistency • Output: Must guess when to commit• Input: Needs mechanism to invalidate cache

• Replication • Must guess what to replicate

Page 13: Explicit Control in a  Batch-aware Distributed  File System

BAD-FS makes good decisions

• Removes the guesswork• Scheduler has detailed workload knowledge• Storage layer allows external control• Scheduler makes informed storage decisions

• Retains simplicity and elegance of DFS• Practical and deployable

Page 14: Explicit Control in a  Batch-aware Distributed  File System

Outline

• Introduction• Batch computing

• Systems• Workloads• Environment• Why not DFS?

• Our answer: BAD-FS• Design• Experimental evaluation

• Conclusion

Page 15: Explicit Control in a  Batch-aware Distributed  File System

• User-level; requires no privilege • Packaged as a modified batch system

• A new batch system which includes BAD-FS• General; will work on all batch systems• Tested thus far on multiple batch systems

Practical and deployable

Internet

SGE SGE

SGE SGE SGE

SGE SGE

SGEBAD-

FSBAD-

FSBAD-

FSBAD-

FSBAD-

FSBAD-

FSBAD-

FSBAD-

FS

Homestore

Page 16: Explicit Control in a  Batch-aware Distributed  File System

Contributions of BAD-FS

Scheduler

Compute node

CPUManager

Compute node

CPUManager

Compute node

CPUManager

Compute node

CPUManager

Job queue

1 2

3 4Home storage

Job queue

3) Expanded job description language

BAD-FSScheduler

4) BAD-FS scheduler

1) Storage managers

2) Batch-Aware Distributed File System

StorageManager

StorageManager

StorageManager

StorageManager BAD-FS BAD-FS BAD-FS

Page 17: Explicit Control in a  Batch-aware Distributed  File System

BAD-FS knowledge

• Remote cluster knowledge• Storage availability• Failure rates

• Workload knowledge• Data type (batch, pipeline, or endpoint)• Data quantity• Job dependencies

Page 18: Explicit Control in a  Batch-aware Distributed  File System

Control through volumes

• Guaranteed storage allocations• Containers for job I/O

• Scheduler• Creates volumes to cache input data

• Subsequent jobs can reuse this data

• Creates volumes to buffer output data• Destroys pipeline, copies endpoint

• Configures workload to access containers

Page 19: Explicit Control in a  Batch-aware Distributed  File System

Knowledge plus control

• Enhanced performance• I/O scoping• Capacity-aware scheduling

• Improved failure handling• Cost-benefit replication

• Simplified implementation• No cache consistency protocol

Page 20: Explicit Control in a  Batch-aware Distributed  File System

I/O scoping• Technique to minimize wide-area traffic• Allocate storage to cache batch data• Allocate storage for pipeline and endpoint• Extract endpoint

AMANDA:200 MB pipeline500 MB batch 5 MB endpoint

BAD-FSScheduler

Compute node Compute node

Internet Steady-state:Only 5 of 705 MB traverse wide-area.

Page 21: Explicit Control in a  Batch-aware Distributed  File System

Capacity-aware scheduling

• Technique to avoid over-allocations• Scheduler runs only as many jobs as fit

Page 22: Explicit Control in a  Batch-aware Distributed  File System

Endpoint

Endpoint

Batch dataset

Batch dataset

Pipeline

Pipeline

Pipeline

Endpoint

Endpoint

EndpointPipeline

Pipeline

Pipeline

Endpoint

Endpoint

EndpointPipeline

Pipeline

Pipeline

Endpoint

Endpoint

EndpointPipeline

Pipeline

Pipeline

Endpoint

Batch dataset

Batch dataset

Capacity-aware scheduling

Page 23: Explicit Control in a  Batch-aware Distributed  File System

• 64 batch-intensive synthetic pipelines• Vary size of batch data

• 16 compute nodes

Capacity-aware scheduling

Page 24: Explicit Control in a  Batch-aware Distributed  File System

Improved failure handling

• Scheduler understands data semantics• Data is not just a collection of bytes• Losing data is not catastrophic

• Output can be regenerated by rerunning jobs

• Cost-benefit replication• Replicates only data whose replication cost is

cheaper than cost to rerun the job

• Results in paper

Page 25: Explicit Control in a  Batch-aware Distributed  File System

Simplified implementation

• Data dependencies known• Scheduler ensures proper ordering

• No need for cache consistency protocol in cooperative cache

Page 26: Explicit Control in a  Batch-aware Distributed  File System

Real workloads

• AMANDA• Astrophysics study of cosmic events such as gamma-ray

bursts• BLAST

• Biology search for proteins within a genome• CMS

• Physics simulation of large particle colliders• HF

• Chemistry study of non-relativistic interactions between atomic nuclei and electors

• IBIS• Ecology global-scale simulation of earth’s climate used to

study effects of human activity (e.g. global warming)

Page 27: Explicit Control in a  Batch-aware Distributed  File System

Real workload experience

• Setup• 16 jobs• 16 compute nodes• Emulated wide-area

• Configuration• Remote I/O• AFS-like with /tmp• BAD-FS

• Result is order of magnitude improvement

Page 28: Explicit Control in a  Batch-aware Distributed  File System

BAD Conclusions• Existing DFS’s insufficient• Schedulers have workload knowledge• Schedulers need storage control

• Caching• Consistency• Replication

• Combining this control with knowledge• Enhanced performance• Improved failure handling• Simplified implementation

Page 29: Explicit Control in a  Batch-aware Distributed  File System

For more information

• http://www.cs.wisc.edu/adsl• http://www.cs.wisc.edu/condor

• Questions?

Page 30: Explicit Control in a  Batch-aware Distributed  File System

Why not BAD-scheduler and traditional DFS?

• Cooperative caching• Data sharing

• Traditional DFS• assume sharing is exception• provision for arbitrary, unplanned sharing

• Batch workloads, sharing is rule• Sharing behavior is completely known

• Data committal• Traditional DFS must guess when to commit

• AFS uses close, NFS uses 30 seconds• Batch workloads precisely define when

Page 31: Explicit Control in a  Batch-aware Distributed  File System

Is cap aware imp in real world?1. Heterogeneity of remote resources2. Shared disk3. Workloads changing, some are very,

very large.

Page 32: Explicit Control in a  Batch-aware Distributed  File System

Capacity-aware scheduling

• Goal• Avoid overallocations

• Cache thrashing• Write failures

• Method• Breadth-first• Depth-first• Idleness

Page 33: Explicit Control in a  Batch-aware Distributed  File System

Capacity-aware scheduling evaluation• Workload

• 64 synthetic pipelines• Varied pipe size

• Environment• 16 compute nodes

• Configuration• Breadth-first• Depth-first• BAD-FS

Failures directly correlate to workload throughput.