workloads 02 tutorial

149
Workload Modeling and its Effect on Performance Evaluation Dror Feitelson Hebrew University

Upload: sudheer

Post on 25-Sep-2015

33 views

Category:

Documents


0 download

DESCRIPTION

workload modelling of performance testing

TRANSCRIPT

  • Workload Modeling
    and its Effect on
    Performance Evaluation

    Dror Feitelson

    Hebrew University

    Thanks toparticipants and progam committee; thanks to Monien; abuse hospitality talk about agenda

  • Performance Evaluation

    In system designSelection of algorithmsSetting parameter valuesIn procurement decisionsValue for moneyMeet usage goalsFor capacity planing

    Important and basic activity

  • The Good Old Days

    The skies were blueThe simulation results were conclusiveOur scheme was better than theirs

    Feitelson & Jette, JSSPP 1997

    Focus on system design. Widely different designs lead to conclusive results.

  • But in their papers,

    Their scheme was better than ours!

    But literature is full of contradictory results.

  • How could they be so wrong?

    Leads to question of what is the cause for contradictions.

  • Performance evaluation depends on:

    The systems design

    (What we teach in algorithms and data structures)

    Its implementation

    (What we teach in programming courses)

    The workload to which it is subjectedThe metric used in the evaluationInteractions between these factors

    Next: our focus is the workloads.

  • Performance evaluation depends on:

    The systems design

    (What we teach in algorithms and data structures)

    Its implementation

    (What we teach in programming courses)

    The workload to which it is subjectedThe metric used in the evaluationInteractions between these factors
  • Outline for Today

    Three examples of how workloads affect performance evaluationWorkload modelingGetting dataFitting, correlations, stationarityHeavy tails, self similarityResearch agenda

    In the context of parallel job scheduling

    Job scheduling, not task scheduling

  • Example #1

    Gang Scheduling and

    Job Size Distribution

  • Gang What?!?

    Time slicing parallel jobs with coordinated context switching

    Ousterhout

    matrix

    Ousterhout, ICDCS 1982

  • Gang What?!?

    Time slicing parallel jobs with coordinated context switching

    Ousterhout

    matrix

    Optimization:

    Alternative

    scheduling

    Ousterhout, ICDCS 1982

  • Packing Jobs

    Use a buddy system for allocating processors

    Feitelson & Rudolph, Computer 1990

  • Packing Jobs

    Use a buddy system for allocating processors

    Start with full system in one block

  • Packing Jobs

    Use a buddy system for allocating processors

    To allocate repeatedly partition in two to get desired size

  • Packing Jobs

    Use a buddy system for allocating processors

  • Packing Jobs

    Use a buddy system for allocating processors

    Or use existing partition

  • The Question:

    The buddy system leads to internal fragmentationBut it also improves the chances of alternative scheduling, because processors are allocated in predefined groups

    Which effect dominates the other?

  • The Answer (part 1):

    Feitelson & Rudolph, JPDC 1996

    Answer as function of workload, but not full answer because workload unknown. Dashed lines: provable bounds.

  • The Answer (part 2):

    Note logarithmic Y axis

  • The Answer (part 2):

  • The Answer (part 2):

  • The Answer (part 2):

    Many small jobsMany sequential jobsMany power of two jobsPractically no jobs use full machine

    Conclusion: buddy system should work well

  • Verification

    Feitelson, JSSPP 1996

    Using Feitelson workload

  • Example #2

    Parallel Job Scheduling

    and Job Scaling

  • Variable Partitioning

    Each job gets a dedicated partition for the duration of its executionResembles 2D bin packingPacking large jobs first should lead to better performanceBut what about correlation of size and runtime?

    First-fit decreasing is optimal

  • Scaling Models

    Constant workParallelism for speedup: Amdahls LawLarge first SJFConstant timeSize and runtime are uncorrelatedMemory boundLarge first LJFFull-size jobs lead to blockout

    Worley, SIAM JSSC 1990

    Question is which model applies within the context of a single machine

  • Scan Algorithm

    Keep jobs in separate queues according to size (sizes are powers of 2)Serve the queues Round Robin, scheduling all jobs from each queue (they pack perfectly)Assuming constant work model, large jobs only block the machine for a short timeBut the memory bound model would lead to excessive queueing of small jobs

    Krueger et al., IEEE TPDS 1994

    Important point: schedule order determined by size

  • The Data

  • The Data

  • The Data

  • The Data

    Data: SDSC Paragon, 1995/6

  • The Data

    Data: SDSC Paragon, 1995/6

    Partitions with equal numbers of jobs; many more small jobs.

  • The Data

    Data: SDSC Paragon, 1995/6

    Similar range, different shape; 80th percentile moves from

  • Conclusion

    Parallelism used for better results, not for faster resultsConstant work model is unrealisticMemory bound model is reasonableScan algorithm will probably not perform well in practice
  • Example #3

    Backfilling and

    User Runtime Estimation

  • Backfilling

    Variable partitioning can suffer from external fragmentationBackfilling optimization: move jobs forward to fill in holes in the scheduleRequires knowledge of expected job runtimes
  • Variants

    EASY backfilling

    Make reservation for first queued job

    Conservative backfilling

    Make reservation for all queued jobs

  • User Runtime Estimates

    Lower estimates improve chance of backfilling and better response timeToo low estimates run the risk of having the job killedSo estimates should be accurate, right?
  • They Arent

    Mualem & Feitelson, IEEE TPDS 2001

    Short=failed; killed typically exceeded runtime estimate, ~15%

  • Surprising Consequences

    Inaccurate estimates actually lead to improved performancePerformance evaluation results may depend on the accuracy of runtime estimatesExample: EASY vs. conservativeUsing different workloadsAnd different metrics

    Will focus on second bullet

  • EASY vs. Conservative

    Using CTC SP2 workload

  • EASY vs. Conservative

    Using Jann workload model

    Note: jann model of CTC

  • EASY vs. Conservative

    Using Feitelson workload model

  • Conflicting Results Explained

    Jann uses accurate runtime estimatesThis leads to a tighter scheduleEASY is not affected too muchConservative manages less backfilling of long jobs, because respects more reservations

    Relative measure: more by EASY = less by conservative

  • Conservative is bad for the long jobs
    Good for short ones that are respected


    Conservative



    EASY

  • Conflicting Results Explained

    Response time sensitive to long jobs, which favor EASYSlowdown sensitive to short jobs, which favor conservativeAll this does not happen at CTC, because estimates are so loose that backfill can occur even under conservative
  • Verification

    Run CTC workload with accurate estimates

  • But What About My Model?

    Simply does not have such small long jobs

  • Workload Data Sources

  • No Data

    Innovative unprecedented systemsWirelessHand-heldUse an educated guessSelf similarityHeavy tailsZipf distribution
  • Serendipitous Data

    Data may be collected for various reasonsAccounting logsAudit logsDebugging logsJust-so logsCan lead to wealth of information
  • NASA Ames iPSC/860 log

    42050 jobs from Oct-Dec 1993

    user job nodes runtime date time

    user4 cmd8 32 70 11/10/93 10:13:17

    user4 cmd8 32 70 11/10/93 10:19:30

    user42 nqs450 32 3300 11/10/93 10:22:07

    user41 cmd342 4 54 11/10/93 10:22:37

    sysadmin pwd 1 6 11/10/93 10:22:42

    user4 cmd8 32 60 11/10/93 10:25:42

    sysadmin pwd 1 3 11/10/93 10:30:43

    user41 cmd342 4 126 11/10/93 10:31:32

    Feitelson & Nitzberg, JSSPP 1995

  • Distribution of Job Sizes

  • Distribution of Job Sizes

  • Distribution of Resource Use

  • Distribution of Resource Use

  • Degree of Multiprogramming

  • System Utilization

  • Job Arrivals

  • Arriving Job Sizes

  • Distribution of Interarrival Times

  • Distribution of Runtimes

  • User Activity

  • Repeated Execution

  • Application Moldability

    Of jobs run more than once

  • Distribution of Run Lengths

  • Predictability in Repeated Runs

    For jobs run more than 5 times

  • Recurring Findings

    Many small and serial jobsMany power-of-two jobsWeak correlation of job size and durationJob runtimes are bounded but have CV>1Inaccurate user runtime estimatesNon-stationary arrivals (daily/weekly cycle)Power-law user activity, run lengths
  • Instrumentation

    Passive: snoop without interferingActive: modify the systemCollecting the data interferes with system behaviorSaving or downloading the data causes additional interferencePartial solution: model the interference
  • Data Sanitation

    Strange things happenLeaving them in is safe and faithful to the real dataBut it risks situations in which a non-representative situation dominates the evaluation results
  • Arrivals to SDSC SP2

  • Arrivals to LANL CM-5

  • Arrivals to CTC SP2

  • Arrivals to SDSC Paragon

    What are they doing at 3:30 AM?

  • 3:30 AM

    Nearly every day, a set of 16 jobs are run by the same userMost probably the same set, as they typically have a similar pattern of runtimesMost probably these are administrative jobs that are executed automatically
  • Arrivals to CTC SP2

  • Arrivals to SDSC SP2

  • Arrivals to LANL CM-5

  • Arrivals to SDSC Paragon

  • Are These Outliers?

    These large activity outbreaks are easily distinguished from normal activityThey last for several days to a few weeksThey appear at intervals of several months to more than a yearThey are each caused by a single user!Therefore easy to remove
  • Two Aspects

    In workload modeling, should you include this in the model?In a general model, probably notConduct separate evaluation for special conditions (e.g. DOS attack)In evaluations using raw workload data, there is a danger of bias due to unknown special circumstances
  • Automation

    The idea:Cluster daily data in based on various workload attributesRemove days that appear alone in a clusterRepeatThe problem:Strange behavior often spans multiple days

    Cirne &Berman, Wkshp Workload Charact. 2001

  • Workload Modeling

  • Statistical Modeling

    Identify attributes of the workloadCreate empirical distribution of each attributeFit empirical distribution to create modelSynthetic workload is created by sampling from the model distributions
  • Fitting by Moments

    Calculate model parameters to fit moments of empirical dataProblem: does not fit the shape of the distribution
  • Jann et al, JSSPP 1997

  • Fitting by Moments

    Calculate model parameters to fit moments of empirical dataProblem: does not fit the shape of the distributionProblem: very sensitive to extreme data values
  • Effect of Extreme Runtime Values

    Downey & Feitelson, PER 1999

    Change when top records omittedomitmeanCV0.01%-2.1%-29%0.02%-3.0%-35%0.04%-3.7%-39%0.08%-4.6%-39%0.16%-5.7%-42%0.31%-7.1%-42%
  • Alternative: Fit to Shape

    Maximum likelihood: what distribution parameters were most likely to lead to the given observationsNeeds initial guess of functional formPhase type distributionsConstruct the desired shapeGoodness of fitKolmogorov-Smirnov: difference in CDFsAnderson-Darling: added emphasis on tailMay need to sample observations
  • Correlations

    Correlation can be measured by the correlation coefficientIt can be modeled by a joint distribution functionBoth may not be very useful
  • Correlation Coefficient

    Gives low results for correlation of runtime and size in parallel systems

    systemCCCTC SP2-0.029KTH SP20.011SDSC SP20.145LANL CM-50.211SDSCParagon0.305
  • Distributions

    A restricted version of a joint distribution

  • Modeling Correlation

    Divide range of one attribute into sub-rangesCreate a separate model of other attribute for each sub-rangeModels can be independent, or model parameter can depend on sub-range
  • Stationarity

    Problem of daily/weekly activity cycleNot important if unit of activity is very small (network packet)Very meaningful if unit of work is long (parallel job)
  • How to Modify the Load

    Multiply interarrivals or runtimes by a factorChanges the effective length of the dayMultiply machine size by a factorModifies packing propertiesAdd users
  • Stationarity

    Problem of daily/weekly activity cycleNot important if unit of activity is very small (network packet)Very meaningful if unit of work is long (parallel job)Problem of new/old systemImmature workloadLeftover workload
  • Heavy Tails

  • Tail Types

    When a distribution has mean m, what is the distribution of samples that are larger than x?

    Light: expected to be smaller than x+m Memoryless: expected to be x+m Heavy: expected to be larger than x+m
  • Formal Definition

    Tail decays according to a power law

    Test: log-log complementary distribution

  • Consequences

    Large deviations from the mean are realisticMass disparitysmall fraction of samples responsible for large part of total massMost samples together account for negligible part of mass

    Crovella, JSSPP 2001

  • Unix File Sizes Survey, 1993

  • Unix File Sizes LLCD

  • Consequences

    Large deviations from the mean are realisticMass disparitysmall fraction of samples responsible for large part of total massMost samples together account for negligible part of massInfinite momentsFor mean is undefinedFor variance is undefined

    Crovella, JSSPP 2001

  • Pareto Distribution

    With parameter the density is proportional to

    The expectation is then

    i.e. it grows with the number of samples

  • Pareto Samples

  • Pareto Samples

  • Pareto Samples

  • Effect of Samples from Tail

    In simulation:A single sample may dominate resultsExample: response times of processesIn analysis:Average long-term behavior may never happen in practice
  • Real Life

    Data samples are necessarily boundedThe question is how to generalize to the model distributionArbitrary truncationLognormal or phase-type distributionsSomething in between
  • Solution 1: Truncation

    Postulate an upper bound on the distributionQuestion: where to put the upper boundProbably OK for qualitative analysisMay be problematic for quantitative simulations
  • Solution 2: Model the Sample

    Approximate the empirical distribution using a mixture of exponentials (e.g. phase-type distributions)In particular, exponential decay beyond highest sampleIn some cases, a lognormal distribution provides a good fitGood for mathematical analysis
  • Solution 3: Dynamic

    Place an upper bound on the distributionLocation of bound depends on total number of samples requiredExample:

    Note: does not change during simulation

  • Self Similarity

  • The Phenomenon

    The whole has the same structure as certain partsExample: fractals
  • The Phenomenon

    The whole has the same structure as certain partsExample: fractalsIn workloads: burstiness at many different time scales

    Note: relates to a time series

  • Job Arrivals to SDSC Paragon

  • Process Arrivals to SDSC Paragon

  • Long-Range Correlation

    A burst of activity implies that values in the time series are correlatedA burst covering a large time frame implies correlation over a long rangeThis is contrary to assumptions about the independence of samples
  • Aggregation

    Replace each subsequence of m consecutive values by their meanIf self-similar, the new series will have statistical properties that are similar to the original (i.e. bursty)If independent, will tend to average out
  • Poisson Arrivals

  • Tests

    Essentially based on the burstiness-retaining nature of aggregationRescaled range (R/s) metric: the range (sum) of n samples as a function of n
  • R/s Metric

  • Tests

    Essentially based on the burstiness-retaining nature of aggregationRescaled range (R/s) metric: the range (sum) of n samples as a function of nVariance-time metric: the variance of an aggregated time series as a function of the aggregation level
  • Variance Time Metric

  • Modeling Self Similarity

    Generate workload by an on-off processDuring on period, generate work at steady paceDuring off period to nothingOn and off period lengths are heavy tailedMultiplex many such sourcesLeads to long-range correlation
  • Research Areas

  • Effect of Users

    Workload is generated by usersHuman users do not behave like a random sampling processFeedback based on system performanceRepetitive working patterns
  • Feedback

    User population is finiteUsers back off when performance is inadequate

    Negative feedback

    Better system stability

    Need to explicitly model this behavior
  • Locality of Sampling

    Users display different levels of activity at different timesAt any given time, only a small subset of users is active
  • Active Users

  • Locality of Sampling

    Users display different levels of activity at different timesAt any given time, only a small subset of users is activeThese users repeatedly do the same thingWorkload observed by system is not a random sample from long-term distribution
  • SDSC Paragon Data

  • SDSC Paragon Data

  • Growing Variability

  • SDSC Paragon Data

  • SDSC Paragon Data

  • Locality of Sampling

    The questions:

    How does this effect the results of performance evaluation?Can this be exploited by the system, e.g. by a scheduler?
  • Hierarchical Workload Models

    Model of user populationModify load by adding/deleting usersModel of a single users activityBuilt-in self similarity using heavy-tailed on/off timesModel of application behavior and internal structureCapture interaction with system attributes
  • A Small Problem

    We dont have data for these modelsEspecially for user behavior such as feedbackNeed interaction with cognitive scientistsAnd for distribution of application types and their parametersNeed detailed instrumentation
  • Final Words

  • We like to think that we design systems based on solid foundations

  • But beware:

    the foundations might be unbased assumptions!

  • We should have more science in computer science:

    Collect data rather than make assumptions Run experiments under different conditions Make measurements and observations Make predictions and verify them Share data and programs to promote good

    practices and ensure comparability

    Computer Systems are Complex

    Science = experimental scince, like physics, chemistry, biology

  • Advice from the Experts

    Science if built of facts as a house if built of stones. But a collection of facts is no more a science than a heap of stones is a house

    -- Henri Poincar

  • Advice from the Experts

    Science if built of facts as a house if built of stones. But a collection of facts is no more a science than a heap of stones is a house

    -- Henri Poincar

    Everything should be made as simple as possible, but not simpler

    -- Albert Einstein

  • Acknowledgements

    Students: Ahuva Mualem, David Talby,

    Uri Lublin

    Larry Rudolph / MITData in Parallel Workloads ArchiveJoefon Jann / IBMAllen Downey / WelselleyCTC SP2 log / Steven HotovySDSC Paragon log / Reagan MooreSDSC SP2 log / Victor HazelwoodLANL CM-5 log / Curt CanadaNASA iPSC/860 log / Bill Nitzberg

    x

    a

    x

    F

    log

    )

    (

    log

    -

    =

    (

    )

    (

    )

    (

    )

    (

    )

    -

    -

    -

    -

    2

    2

    y

    y

    x

    x

    y

    y

    x

    x

    i

    i

    i

    i

    (

    )

    (

    )

    2

    0

    Pr