dios

12
DIOS: Dynamic Instrumentation for (not so) Outstanding Scheduling Blake Sutton & Chris Sosa

Upload: awesomesos

Post on 20-Nov-2014

502 views

Category:

Technology


2 download

DESCRIPTION

Presentation for OS class of DIOS our scheduling system that took real-time attributes from hardware systems to change scheduling behavior

TRANSCRIPT

Page 1: DIOS

DIOS: Dynamic Instrumentation for(not so) Outstanding SchedulingBlake Sutton & Chris Sosa

Page 2: DIOS

Motivation

Scheduling jobs on a group of machines Cluster Distributed operating system

Don’t know what to expect at submission time!

Memory contention

Migrate processes away to a better place...

Page 3: DIOS

Approach: Adaptive Distributed Scheduler

Monitor machines and processes to motivate migration decisions.

Gather application-specific info and feed to local schedulers.

Global scheduler collects local schedulers’ observations and uses information on all machines and all applications to make decisions. Migrate? Which one? Where? Pause? Which one? How long?

Page 4: DIOS

Dynamic Instrumentation with Pin

Insert new code into apps on the fly No recompile Operates on copy Code cache

Our Pintool Routine-level Instruction-level

Page 5: DIOS

Application-Specific Information

Want to capture memory behavior over time

We gathered: Ratio of malloc to free calls

Wall-clock time to execute 10,000,000 insns

Number of memory ops in last 2,000,000 insns

Page 6: DIOS

Evaluation

Distributed scheduler Rhino on realitytv16, Hare

on realitytv13-16 Looks for % memory free

and restarts youngest job heatedplate with modified

parameters Baseline: Queue balancing

Pintool 2 applications from

SPLASH-2 Heatedplate

Page 7: DIOS

The Good

Potential for improvement

Lower total runtime with simple policy

Restart youngest

Page 8: DIOS

The Bad

Overhead from Pintool is too high to realize gains Pin isn’t designed for on-the-fly analysis Couldn’t attach / detach Code caching can’t save it

application native only pin count malloc/free # mems latency

heatedplate 1.00 1.88 2.65 5.43 7.45 7.26

ocean 1.00 1.48 2.87 7.84 6.04 5.81

lu 1.00 1.25 6.27 14.51 7.90 7.64

Page 9: DIOS

The “Interesting”

Pintool does capture intriguing info…

Page 10: DIOS

Conclusion: the Future of DIOS

Overhead is prohibitive – for now Add attach / detach Lighter instrumentation framework

But instrumentation can capture aspects of application-specific behavior!

Marty was right.

Find out the final answer: 9am 5/9, MEC215.

Page 11: DIOS

¿Preguntas?

Page 12: DIOS

Wait…hasn’t this been solved?

Condor popular user-space distributed scheduler process migration tries to keep queues balanced

but jobs have different behavior over time from each other

LSF (Load Sharing Facility) monitors system, moves processes around based on what they need must input static job information (requires profiling etc beforehand)

what if something about your job isn't captured by your input? what if you end up giving it margins that are too large? too small? unnecessary inefficiencies? it's not exactly hassle-free...  

Hardware feedback PAPI Still not very portable (invasive kernel patch for install)

Wouldn't it be nice if the scheduler could just..."do the right thing"?