1 thesis proposal zachary kurmas (v4.0– 24 april 03)
Post on 19-Dec-2015
216 views
TRANSCRIPT
![Page 1: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/1.jpg)
1
Thesis Proposal
Zachary Kurmas
(v4.0– 24 April 03)
![Page 2: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/2.jpg)
2
Outline
• Motivation and discussion of problem• Overview of of solution• Contributions• Proposal• Future Work• Timeline• Details of solution (time permitting)
![Page 3: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/3.jpg)
3
Typical disk array
Controller ACache
Controller BCache
SCSI Buses
Fibre Channel
Hosts
![Page 4: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/4.jpg)
4
Motivation
• Potential storage system designs and automated configuration algorithms must be evaluated with respect to some set of workloads.• Ideally, these workloads are actual production
workloads.• This is usually impossible
• Two alternatives• Replay traces of production workloads• Construct and use synthetic workload
![Page 5: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/5.jpg)
5
Problem
• The currently available set of workload traces and synthetic workloads are not sufficient• Can’t get enough of right traces
• Companies don’t like to give them out• No traces of future workloads
• Quality of synthetic workloads too low• High-quality synthetic workload must share certain
key properties with production workload• These properties are currently found by trial-and-
error and domain expertise
![Page 6: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/6.jpg)
6
Solution
• Improve quality of synthetic I/O workloads• Automatically determine what properties
a synthetic workload must share with the production workload on which it is based(R,1024,120932,124)(W,8192,120834,126)(W,8192,120844,127)(R,2048,334321,131
...
Production Workload List of Properties SyntheticWorkload
(R,1024,120932,124)(W,8192,120834,126)(W,8192,120844,127)(R,2048,334321,131
...
CDF of Response Time
![Page 7: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/7.jpg)
7
Contributions
• Prototype system to automatically determine what properties a synthetic workload must share with the production workload on which it is based• Library of possible properties and corresponding generation
techniques• Algorithm for searching through library
• Examination of tradeoffs between size and complexity of properties and quality of synthetic workloads
• Evaluation of whether improved synthetic workloads enable us to make better design decisions
• Exploration of workload scaling using identified properties
![Page 8: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/8.jpg)
16
Review of problem
• Not enough input for evaluations of storage systems• Too few workload traces• Traces not always right answer• Synthetic workloads are not practical
• Don’t know precisely what makes synthetic workloads representative
• Trial-and-error too cumbersome• Can’t maintain every conceivable attribute-
value
![Page 9: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/9.jpg)
17
Outline
• Discussion of problem• Overview of of solution• Contributions• Proposal• Future Work• Timeline• Details of solution (time permitting)
![Page 10: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/10.jpg)
18
Attributes / Attribute-Values
• An attribute is the name or description of a property• Read percentage• Mean interarrival time
• An attribute-value is the actual value of the measurement (i.e., the actual property.) • Read percentage of 67• Mean interarrival time of .8ms
![Page 11: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/11.jpg)
19
Requirements of attributes
• Attribute-values are properties of only the workload.• Response time not an attribute because
attribute-value depends on both workload and disk array
• Attributes must be quantifiable• “Locality” and “burstiness” are
qualitative concepts. “runCount” and “Hurst parameter” are attributes
![Page 12: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/12.jpg)
20
The Distiller
• Automate process of choosing necessary attribute-values
• Input: workload trace and large set of attributes
• Output: set of attributes that identifies those attribute-values that synthetic workload must share with target
• Helps identify type of any necessary attribute missing from library• (if no known set of attribute-values leads to a
representative synthetic workload)
![Page 13: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/13.jpg)
21
Basic Idea
• Basic Idea • Begin with simple attribute-values
• (distributions of I/O request parameters)
• Iteratively add attribute-values until evaluation of original and synthetic workloads is similar
(R,1024,120932,124)(W,8192,120834,126)(W,8192,120844,127)(R,2048,334321,131
...
Production Workload Attribute-value List SyntheticWorkload
(R,1024,120932,124)(W,8192,120834,126)(W,8192,120844,127)(R,2048,334321,131
...
CDF of Response Time
![Page 14: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/14.jpg)
23
Challenges: Which attribute to add
• Each iteration takes many minutes; therefore, we must limit the number of iterations
• Addition of necessary attribute-values does not always result immediately in improvement
• Fewer attributes better• Smaller compact representation• Less complex generation techniques• Generation techniques for some attributes can
interfere with each other• E.g., distribution of location and jump distance
![Page 15: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/15.jpg)
26
Outline
• Discussion of problem• Overview of of solution• Contributions
• The Distiller itself• An analysis of the key attributes for many different
workloads• A an analysis of the potential uses of synthetic workloads
• Proposal• Future Work• Timeline• Details of solution (time permitting)
![Page 16: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/16.jpg)
27
The Distiller itself
• Distiller makes generating representative synthetic workloads practical• Encourage companies to make evaluation
workloads available• More accurate / relevant research results
• Basis for “what-if” evaluations• Future estimations• Stability estimates• Improved relevance of old evaluation workloads
(possibly)
• Distiller provides library of attributes and corresponding generation techniques
![Page 17: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/17.jpg)
28
Analysis of key attributes for many workloads
• Attribute-values that lead to representative synthetic workloads describe what makes the workload behave like or unlike other workloads• The “essence” of the workload
• Possible to study essence of many different (workload, storage system) pairs and look for interesting trends or patterns
![Page 18: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/18.jpg)
29
Potential benefits of analyses
• Help us learn about how workloads and storage systems interact• Attribute-values contain all info
necessary to predict behavior• Focus researchers’ attention on
concentrated information• Help development of analytical models• Identify potential areas of improvement
![Page 19: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/19.jpg)
30
Outline
• Discussion of problem• Overview of of solution• Contributions• Proposal
• Evaluate the correctness of the Distiller• Examine the attributes chosen for different
workload/storage system pairs• Show that the resulting synthetic workloads are useful
• Future Work• Timeline• Details of solution (time permitting)
![Page 20: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/20.jpg)
31
Evaluate correctness of Distiller
• Show that the Distiller works for:• One definition of “representative”:
response time distribution • Up to three storage systems: FC-60, FC-
30, and JBOD (Just a Bunch Of Disks)• Several artificial workloads• Five production workloads: Open Mail,
TPC-C, TPC-H, file system trace• Stopping Condition:
• Distiller can correctly identify key attributes for artificial workloads
![Page 21: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/21.jpg)
32
Definition of “representative”
• Design decisions almost always based on performance. Thus, matching response time distributions should be a stronger condition than most design decisions
• Distribution of response time stronger condition than mean response time
• Many decisions decide between competing configurations. Showing applicability across storage system configurations is my next evaluation
•Workloads are considered representative when RMS difference between distributions of response time is sufficiently small
SecondsN
umbe
r of
IO
s
representative
Not representative
![Page 22: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/22.jpg)
38
Outline
• Discussion of problem• Overview of of solution• Contributions• Proposal
• Demonstrate that the Distiller works • Examine the attributes chosen for different
workload/storage system pairs• Show that the resulting synthetic workloads are useful
• Future Work• Timeline• Details of solution (time permitting)
![Page 23: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/23.jpg)
39
Learn about attributes (1)
• Determine if attributes depend on the workload.• My Guess: Yes. Locality attributes are probably
different for write-only workload on FC-60
• Determine if attributes depend on the storage system?• Answer: They must. Storage system with constant
2min response time has no important attributes• Better objective: Compare attributes chosen for
similar storage systems / system configurations.• How much overlap?
![Page 24: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/24.jpg)
40
Learn about attributes (2)
• Determine which set of attributes does best overall (for a given storage system configuration)• average over all workloads• best worst-case• Can either of these be used in practice for all
wklds?• Attempt to find a single set of attributes
that works for almost all workloads (e.g. take union of all chosen attributes)• Examine complexity (e.g., number of
attributes) of such a set
![Page 25: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/25.jpg)
41
Learn about attributes (3)
• Examine changes in attributes and attribute-values over time.• Compare traces of a file system taken in 1992,
1996, 1999, and 2002.• Attempt to develop scaling rules.
• Examine tradeoffs between accuracy and complexity.
• Attempt assign a “percent contribution” to each attribute and/or attribute group?
![Page 26: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/26.jpg)
42
Outline
• Discussion of problem• Overview of of solution• Contributions• Proposal
• Demonstrate that the Distiller works • Examine the attributes chosen for different
workload/storage system pairs• Show that the resulting synthetic workloads are useful
• Future Work• Timeline• Details of solution (time permitting)
![Page 27: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/27.jpg)
43
Apply to real life
• Show that synthetic workloads can be used to make design decisions
• Show that currently available traces not adequate
• Show usefulness of “knobs”
![Page 28: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/28.jpg)
44
Synthetic workloads useful
• Show that synthetic workloads can be used in place of real workloads to make simple design decision• Cache size• Prefetch length• High-water mark of write-back cache
• Complex design decisions basis for entire Ph.D. theses. Can’t practically reproduce at Tech.
• Use Pantheon disk simulator to simulate effects of changing above parameters
![Page 29: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/29.jpg)
45
Synthetic workloads useful (2)
• Take production workload trace• Simulate performance given different prefetch
lengths.• Choose best• Take synthetic workload based on production
workload• Simulate performance given different prefetch lengths• Compare best to best for production workload
• For cache size, find best performance/$ mark.
![Page 30: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/30.jpg)
46
![Page 31: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/31.jpg)
47
Show available traces inadequate
• Use Pantheon disk simulator to show that using the cello92 and cello02 traces to evaluate simple design decisions results in different answers.
• From this we infer that using cello92 traces to justify more complex design decisions also produces incorrect answer
![Page 32: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/32.jpg)
48
Turning “knobs” useful
• Show that turning “knobs” of compact representation better than ad-hoc modifications to workload traces• Show that turning arrival time knob better
than contracting interarrival times• Show that turning request size knob better
than ad-hoc doubling of request size and location values.
• Evaluate turning of knobs versus removing ½ of cello I/Os based on process ID
![Page 33: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/33.jpg)
49
Future Work
• Optimality • Find “smallest” set of attributes per
workload (e.g. set of attributes with smallest compact representation)
• Find smallest set of attributes per storage system (if possible)
• Use chosen attributes to develop analytical model of performance• Formula for performance, not simulation
![Page 34: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/34.jpg)
50
Timeline• April June: Run Distiller on many workloads
• Submit results to MASCOTS• June July: Analyze changes over different workloads /
storage systems. • Submit results to CMG conference
• July August • Find best overall set of attributes. Find best worst-case
• September October: Attempt to develop set of attributes that works for all workloads on a given storage system• Submit results to SIGMETRICS and/or FAST
• November December: Evaluate different what-if scenarios.
• February 2004: Defense• January 2004 February 2004: write• March 2004 April 2004: interview• May 2004 July 2004: write• August 2004: graduate
![Page 35: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/35.jpg)
51
Outline
• Discussion of problem• Overview of of solution• Contributions• Proposal• Future Work• Timeline• Details of solution (time permitting)
![Page 36: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/36.jpg)
52
Generating Synthetic Workload
• To generate synthetic workload, randomly choose value for each element in table
• Attribute-values put restrictions on values chosen
• Adding attribute-values reduces the difference between synthetic and production workloads
(R, 1024, 42912, 10)(W, 8192, 12493, 12)(W, 2048, 20938, 15)(R, 2048, 43943, 2)(W 8192, 98238, 11)(W 8192, 76232, 23)
ReadWrite
RequestSize Location
ArrivalTime
![Page 37: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/37.jpg)
53
Mean Arrival Time
Arrival Time Dist.
Hurst Parameter
Mean Request Size
Request Size Dist.
Request Size Attrib 3
Request Size Attrib 4 COV of Arrival Time
Dist. of Locations Read/Write ratio
Mean run length Markov Read/Write
Jump Distance R/W Attrib. #3
Proximity Munge R/W Attrib #4
Mean Read Size D. of (R,W) Locations
Read Rqst. Size Dist. Mean R,W run length
Mean (R, W) Sizes R/W Jump Distance
(R, W) Size Dists. R/WProximity Munge
Mean Arrival Time
Arrival Time Dist.
Hurst Parameter
Mean Request Size
Request Size Dist.
Request Size Attrib 3
Request Size Attrib 4 COV of Arrival Time
Dist. of Locations Read/Write ratio
Mean run length Markov Read/Write
Jump Distance R/W Attrib. #3
Proximity Munge R/W Attrib #4
Mean Read Size D. of (R,W) Locations
Read Rqst. Size Dist. Mean R,W run length
Mean (R, W) Sizes R/W Jump Distance
(R, W) Size Dists. R/WProximity Munge
Choosing Attribute Wisely
• Challenge• Not all attributes useful• Some attributes partially
redundant• Can’t test all attributes
• My Solution• Group attributes • Evaluate whole groups at once
Attributes
![Page 38: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/38.jpg)
54
Attribute Groups
• Attributes measure one or more parameters• Mean Request Size Request Size• Distribution of Location Location• Burstiness Interarrival Time• Request Size • Read/Write
• Attributes grouped by parameter(s) measured• Location = {mean location, distribution of location,
locality, mean jump distance, mean run length, ...}• Arrival Time = {mean interarrival time, Markov
model of interarrival time, Hurst parameter, etc. }
Distribution of Read Size
![Page 39: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/39.jpg)
55
Attribute Groups
• Each group corresponds to each column or set of columns• Operation Type• Request Size• {Arrival Time,
Location}
• Measures patterns within column(s)
(R, 1024, 42912, 10)(W, 8192, 12493, 12)(W, 2048, 20938, 15)(R, 2048, 43943, 2)(W, 8192, 98238, 11)(W, 8192, 76232, 23)
ReadWrite
RequestSize Location
ArrivalTime
Workload
![Page 40: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/40.jpg)
56
121315
Do I need (more) attributes from the {Arrival Time} group?
• Idea #1: Add “best” attribute from {Arrival Time} and measure improvement• Amount of improvement implies potential benefit
R/W RS Loc AT R/W RS Loc AT
Current Attributes Attributes for Test
R, 1024, 10242W, 2048, 11224R, 1024, 10252
Current
Current
Current
Current
R, 1024, 10242W, 2048, 11224R, 1024, 10252
Current
Current
121415P
erfe
ct
![Page 41: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/41.jpg)
57
Problem with idea #1
• Errors involving other parameters can interfere• Very random reads can overshadow moderate
queuing effects
R/W RS Loc AT R/W RS Loc AT
Per
fectCurrent
Current Attributes Attributes for Test
Cur
rentCurrent
![Page 42: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/42.jpg)
58
Idea #2 --- Idea #1 “backwards”
• Look at a synthetic workload in which everything except Arrival Time is “perfect”.• Change in performance implies importance of
group.
R/W RS Loc AT
Cur
rent
Current Arrival Time Attributes
Perfect
Everything PerfectR/W RS Loc AT
Production Workload
Perfect
Workload Trace R/W RS Loc AT
Workload Trace
![Page 43: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/43.jpg)
59
Problem with idea #2
• Workload on left missing not only {Arrival Time}• Also missing {Arrival Time, Request Size}, {Arrival Time,
Location} and {Arrival Time, Operation Type}• Cause of any difference not clear
R/WRS Loc AT R/W RS Loc AT
Current Operation Type Attributes Workload Trace
Production Workload
Production Workload
Cur
rent
![Page 44: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/44.jpg)
60
Solution
• Remove {Arrival Time, Request Size}, {Arrival Time, Location} and {Arrival Time, Operation Type} from workload trace by “rotating” arrival times.• Only difference between workloads is {Arrival Time}
R/W RS Loc AT
Cur
rent
R/W RS Loc AT
Current Operation Type Attributes
“Rotated” Arrival Time
Production Workload
Production Workload
Prod
ucti
on
Wor
kloa
d
![Page 45: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/45.jpg)
61
Process
• Add {Operation Type} attributes until two workloads below are representative
• Repeat for other attribute groups
R/W RS Loc AT
Cur
rent
R/W RS Loc AT
Current Operation Type Attributes
“Rotated” Operation Types
Production Workload
Production Workload
Prod
ucti
on
Wor
kloa
d
![Page 46: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/46.jpg)
62
Hints
• If Distiller is unable to find attributes for a particular group, it identifies the deficiency• Helps people develop new attributes
• Attributes for multi-parameter groups must be compatible with single parameter groups• {Operation Type, Location} attribute must
maintain same properties as chosen {Operation Type} and {Location} parameters
![Page 47: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/47.jpg)
63
End Of Talk
![Page 48: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/48.jpg)
64
Problem:
• Lack of traces for researchers• …. Papers use same … traces
• Traces used may or may not be representative of actual production workloads
• When traces not sufficient, really bad synthetic workloads used instead
• We don’t know how to easily produce representative synthetic workload• Lack of synthetic workload generation ability
suggests lack of understanding of disk array and storage system interactions
![Page 49: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/49.jpg)
65
Proposed Solution
• Improve our ability to generate synthetic workloads
• (Discuss previous work)
![Page 50: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/50.jpg)
66
Workload Characteristic
• Characteristic: A property of a workload (or workload trace) that can be measured.• 27% reads• Mean request size of 8KB
• Must be property of workload alone• Response time not workload characteristics, but
characteristics of both workload and storage system
• Must be concrete measurable property.• “burstiness” and “locality” too vague.
• Also called “attribute-values”
![Page 51: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/51.jpg)
67
Attributes
• Attribute: The “name” of a characteristic• Attribute Characteristic• eye color blue eyes• Read percentage 27% reads• mean request size mean size: 8KB
• Hence, characteristics also called “attribute-values”
![Page 52: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/52.jpg)
68
How the Distiller works
• Partition known attributes into groups• All characteristics in each group contain
similar information
• Choose a “complete” set of characteristics from each group.• i.e. choose a set of characteristics that
contains all the necessary information from the group
![Page 53: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/53.jpg)
69
$21,000 question
• What attribute-values must a synthetic workload should share with the production workload in order to be representative?• Do the attributes depend on the workload?• Do the attributes depend on the storage
system?• If so, how can we find them easily?• If not, what are they?
![Page 54: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/54.jpg)
70
Trivial “solution” doesn’t work
• Trivial solution: Use many attribute-values• Problems with trivial solution
• Many attribute-values contain irrelevant info.• Many attribute-values contain duplicate info.• High-level description too large and complex
• Negates advantages of synthetic workload
• Generating synthetic workload too difficult • Obvious algorithms for generating attribute-values
often interfere with each other.
![Page 55: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/55.jpg)
71
Challenges of Useful Solution
• Solution: Choose small set of “important” attribute-values• That is, attribute-values that have the
most impact on evaluation
• Challenges• Estimating impact of single attribute-
value on evaluation• Finding small set of attribute-value with
“disjoint” information
![Page 56: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/56.jpg)
72
Goal of Distiller
CDF of Response Time
• will have evaluation similar to original.
(R,1024,120932,124)(W,8192,120834,126)(W,8192,120844,127)(R,2048,334321,131
...
Original Workload
• Given a workload and storage system, • automatically find a set of attributes, so
Attribute List SyntheticWorkload(R,1024,120932,124)(W,8192,120834,126)(W,8192,120844,127)(R,2048,334321,131
...
• synthetic workloads with the same values
![Page 57: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/57.jpg)
73
High-level approach
• Divide and Conquer• Partition attributes into groups
according to “type of information”• Recall some attributes describe similar info.
• Find a set of attribute-values that contains all the information for a particular group
• (No, its not that simple …)
![Page 58: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/58.jpg)
74
Workload
• I/O request has four parameters• Read/Write type• Request Size• Location• Arrival Time
• shown in ms
• Workload series of I/O requests • Trace can be viewed as a
table with four columns
(R, 1024, 42912, 10)(W, 8192, 12493, 12)(W, 2048, 20938, 15)(R, 2048, 43943, 2)(W 8192, 98238, 11)(W 8192, 76232, 23)
ReadWrite
RequestSize Location
ArrivalTime
![Page 59: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/59.jpg)
75
“Engineering” Contributions
• Finding representative synthetic workloads becomes practical• Basis for evaluations when traces are unavailable• Basis for “what-if” evaluations
• Provides basis for workload similarity metric• “Table-based” models
• Highlight what workload features a storage system handles best• Help configure storage system for new workload
![Page 60: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/60.jpg)
76
![Page 61: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/61.jpg)
77
Apply to “real life”
• Attempt to generate a representative synthetic workload when no trace exists• Choose workload trace and “hide” it• Use lessons from previous slides to choose attributes based
on similar workloads• Compare synthetic workload to trace
• Compare “what-if” workload based on chosen attributes to ad-hoc “what-if” workload• Play workload twice as fast• “bootstrapping”
• Attempt to find attributes that determine whether to use Raid 1/0 or Raid 5
![Page 62: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/62.jpg)
78
Proposal: Apply to Real Life (2)
• Attempt to build “table-based” model of performance• n-dimensional table• Each axis represents one attribute• fill element (w, x, y, …) with performance of
workload with attribute-values w, x, y, …• Given new workload
• compute attribute-values w, x, y, … • Value in corresponding table element estimate of
performance
![Page 63: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/63.jpg)
79
Problem
• Both alternatives have problems:•Workload traces
•Companies don’t like to give them out•Don’t always meet the researcher’s
needs
•Synthetic workloads•Difficult and tedious to generate
correctly
![Page 64: 1 Thesis Proposal Zachary Kurmas (v4.0– 24 April 03)](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d385503460f94a11197/html5/thumbnails/64.jpg)
80
Overview
• Motivation: Storage system design studies and automated management systems require workloads to drive evaluations
• Problem: Neither traces of production workloads nor simple synthetic workloads are sufficient to drive experimental evaluation
• My solution: Improve the quality of synthetic storage workloads by automatically determining what properties synthetic workloads must share with the production workloads they model