minerva: an automated resource provisioning tool for large-scale storage systems g. alvarez, e....

35
MINERVA: an MINERVA: an automated resource automated resource provisioning tool provisioning tool for large-scale for large-scale storage systems storage systems G. Alvarez, E. Borowsky, S. G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker- Go, T. Romer, R. Becker- Szendy, R. Golding, A. Szendy, R. Golding, A. Merchant, M. Spasojevic, A. Merchant, M. Spasojevic, A. Veitch, J. Wilkes Veitch, J. Wilkes

Upload: alan-hazelton

Post on 11-Dec-2015

220 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

MINERVA: an automated MINERVA: an automated resource provisioning resource provisioning

tool for large-scale tool for large-scale storage systemsstorage systems

G. Alvarez, E. Borowsky, S. Go, T. G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Romer, R. Becker-Szendy, R.

Golding, A. Merchant, M. Golding, A. Merchant, M. Spasojevic, A. Veitch, J. WilkesSpasojevic, A. Veitch, J. Wilkes

Page 2: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

Large Scale Storage SystemsLarge Scale Storage Systems

►Very Difficult to configure and designVery Difficult to configure and design 10 – 100s of host computers10 – 100s of host computers 10 – 100s of storage devices10 – 100s of storage devices 10 – 1000s of Disks/Logical Volumes10 – 1000s of Disks/Logical Volumes Terabytes of capacityTerabytes of capacity

►Meet throughput demandsMeet throughput demands►Maximize capacity utilizationMaximize capacity utilization►Automation would be nice…Automation would be nice…

Page 3: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

MINERVAMINERVA

► Subdivide problem into three stagesSubdivide problem into three stages Choose correct device setChoose correct device set Choose correct configuration parametersChoose correct configuration parameters Map user data onto devicesMap user data onto devices

►NP-hardNP-hard

► Architectural elementsArchitectural elements Declarative descriptions of storage workload Declarative descriptions of storage workload

requirementsrequirements Constraint-based problem representationConstraint-based problem representation Optimization strategies and heuristicsOptimization strategies and heuristics Analytic performance modelsAnalytic performance models

Page 4: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

MINERVA InputsMINERVA Inputs

►Workload DescriptionWorkload Description Data type descriptions and access patternsData type descriptions and access patterns Two typesTwo types

►StoresStores Logically contiguous data (db table or filesystem)Logically contiguous data (db table or filesystem)

►StreamsStreams Sequences of accesses on a store (pattern and Sequences of accesses on a store (pattern and

throughput)throughput)

►Device DescriptionsDevice Descriptions Disk information (number, size, and type)Disk information (number, size, and type) Array information (number of LUNs)Array information (number of LUNs)

Page 5: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

MINERVA ObjectsMINERVA Objects

Page 6: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

MINERVA OutputsMINERVA Outputs

► AssignmentAssignment Device Set taken from Device DescriptionsDevice Set taken from Device Descriptions Mapping of stores to devicesMapping of stores to devices 22nnnnmm possible configurations possible configurations

►O((2m)O((2m)mm) complexity) complexity

GoalGoal►Minimum cost that meets performance requirementsMinimum cost that meets performance requirements

► Effector toolEffector tool Takes assignment as inputTakes assignment as input Automated configuration of physical devicesAutomated configuration of physical devices

Page 7: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

Storage System LifecycleStorage System Lifecycle

Page 8: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

ArchitectureArchitecture► Array AllocationArray Allocation

Tagger Tagger ► Assigns a preferred RAID levelAssigns a preferred RAID level

AllocatorAllocator► Determines number of arraysDetermines number of arrays

► Array ConfigurationArray Configuration Array DesignerArray Designer

► Actually configures the arraysActually configures the arrays► Store AssignmentStore Assignment

SolverSolver► Assigns stores to LUNsAssigns stores to LUNs

OptimizerOptimizer► Prunes unused resources and balances loadPrunes unused resources and balances load

► EvaluatorEvaluator Verifies design with analytic modelsVerifies design with analytic models

Page 9: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

ArchitectureArchitecture

Page 10: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

MINERVA ProcessMINERVA Process

Page 11: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

Analytical Device ModelsAnalytical Device Models

►Determines feasibilityDetermines feasibility► Predicted throughput error rate = 20% Predicted throughput error rate = 20% ► Streams Streams

Modeled as ON-OFF Markov-modulated Poisson Modeled as ON-OFF Markov-modulated Poisson processprocess

► ArraysArrays Array controller, bus connection, disksArray controller, bus connection, disks

► Case StudyCase Study HP SureStore Model 30/FC High Availability disk HP SureStore Model 30/FC High Availability disk

arrayarray

Page 12: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

TaggerTagger

► Choose storage class based on Choose storage class based on access patternaccess pattern

RAID 1/0 or RAID 5RAID 1/0 or RAID 5

► Rule BasedRule Based1.1. Determines capacity bound storesDetermines capacity bound stores

2.2. Estimates average number of IO ops per Estimates average number of IO ops per sec.sec.

IOPSIOPS

Page 13: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

Capactiy RulesCapactiy Rules

►Calculated per GB of storageCalculated per GB of storage►Capacity bound = RAID 5Capacity bound = RAID 5

Page 14: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

IOPS EstimationIOPS Estimation

►RAID level = least number of per-disk RAID level = least number of per-disk IOPSIOPS

Page 15: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

AllocatorAllocator

►““reasonable” set of arraysreasonable” set of arrays►3 steps3 steps

Consider type and number of arraysConsider type and number of arrays Consider array configurationsConsider array configurations Consider LUN divisions and RAID Consider LUN divisions and RAID

configurationsconfigurations

Page 16: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

Allocator modelsAllocator models

►Can only use analytic device modelsCan only use analytic device models► Ignores stream phasingIgnores stream phasing►Rillifier handles large resource Rillifier handles large resource

demandsdemands Distribute workload among different LUNsDistribute workload among different LUNs Stores become shardsStores become shards

►Excessive capacity requirementsExcessive capacity requirements

Streams become rillsStreams become rills►Excessive throughput requirementsExcessive throughput requirements

Page 17: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

Allocator SearchAllocator Search

► Uses Branch-and-Bound strategyUses Branch-and-Bound strategy Determines number of array typesDetermines number of array types Chooses lowest cost that supports workloadChooses lowest cost that supports workload

► Searches array configurationsSearches array configurations Starts with mixed arraysStarts with mixed arrays Iteratively converts arrays to dedicated typesIteratively converts arrays to dedicated types Branch and Bound-bias dedicatedBranch and Bound-bias dedicated

►Searches in reverse order starting with dedicated typesSearches in reverse order starting with dedicated types

► Calls array designer with configurationCalls array designer with configuration If array designer fails, search continuesIf array designer fails, search continues

Page 18: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

Array DesignerArray Designer

►Determines LUN sizes and array Determines LUN sizes and array parametersparameters

►Starts with simple cases of equal size LUNsStarts with simple cases of equal size LUNs Also considers greedy configurationAlso considers greedy configuration

►Workload description determines LUN sizeWorkload description determines LUN size

►Relies on Optimizer to take care of unused Relies on Optimizer to take care of unused capacitycapacity

►Target disk assignment done with round Target disk assignment done with round robin across busesrobin across buses

Page 19: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

SolverSolver

►Assigns stores to LUNsAssigns stores to LUNs►Multidimensional constrained bin-Multidimensional constrained bin-

packingpacking Uses analytic device models to evaluate Uses analytic device models to evaluate

objective functionobjective function Constraints:Constraints:

►LUN capacityLUN capacity►LUN phased utilizationLUN phased utilization►Array bus bandwidthArray bus bandwidth►Array controller utilizationArray controller utilization

Page 20: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

Solver HeuristicsSolver Heuristics

►Simple RandomSimple Random 50 random cases using first fit50 random cases using first fit

►ToyodaToyoda Best fit using gradient functionBest fit using gradient function

►Objective function combined with economic Objective function combined with economic utilizationutilization

►(1/penalty – lun_cost)(1/penalty – lun_cost) Favors LUNS already in use or low costFavors LUNS already in use or low cost

►LUNs filled in order of increasing costLUNs filled in order of increasing cost Minimizes resource contentionMinimizes resource contention

Page 21: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

Solver Heuristics 2Solver Heuristics 2

►ToyodaWeightedToyodaWeighted Maps gradients against remaining Maps gradients against remaining

available resourcesavailable resources Maps stores to LUNs such that utilization Maps stores to LUNs such that utilization

is balancedis balanced Objective_function * cos(Objective_function * cos(αα)) Objective_function = max_lun_cost – Objective_function = max_lun_cost –

lun_costlun_cost►Minimizes costMinimizes cost

Page 22: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

Toyoda and ToyodaWeightedToyoda and ToyodaWeighted

Page 23: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

OptimizerOptimizer

► Reruns Solver against configurationReruns Solver against configuration Reduces required arraysReduces required arrays

► Runs ToyodaWeighted with new objective Runs ToyodaWeighted with new objective functionfunction Objective_value = 1 – lun_utilizationObjective_value = 1 – lun_utilization Assigns stores to underutilized LUNsAssigns stores to underutilized LUNs

► VariationsVariations Simple RandomSimple Random

►Randomized first fit, chooses lowest utilization varianceRandomized first fit, chooses lowest utilization variance Simple BalancedSimple Balanced

►Round robin first fit, based on capacity and utilization Round robin first fit, based on capacity and utilization constraintsconstraints

Page 24: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

ClustererClusterer

►Addresses performance scaling issues Addresses performance scaling issues With many stores runtime grew to daysWith many stores runtime grew to days

►Combines multiple stores into a clusterCombines multiple stores into a cluster Cluster is mapped instead of storesCluster is mapped instead of stores

►Cluster rules based on observationCluster rules based on observation 10MB/s bandwidth10MB/s bandwidth 2GB size2GB size

► Increases cost ~3%Increases cost ~3%

Page 25: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

EvaluationEvaluation

►Analytic model performance Analytic model performance predictionspredictions

►Evaluate sensitivity to workload Evaluate sensitivity to workload changeschanges

►Effect of design changesEffect of design changes►Measure live systemMeasure live system

Page 26: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

Model ValidationModel Validation

►Based on single FC-30Based on single FC-30►Ran performance tests on physical Ran performance tests on physical

systemsystem►Compared results to model predictionsCompared results to model predictions►Results showed mean error rate of Results showed mean error rate of

+5.4%+5.4% Range of [-11%, +19%]Range of [-11%, +19%]

Page 27: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

Safety and SensitivitySafety and Sensitivity

►Examined scaling of workload Examined scaling of workload parametersparameters

►Start with baseline workload, then Start with baseline workload, then modify a single parametermodify a single parameter

►Wanted to have 3 effectsWanted to have 3 effects Mixing of appropriate RAID levelsMixing of appropriate RAID levels Requiring non-trivial number of arrays (2+)Requiring non-trivial number of arrays (2+) Balanced store performance requirementsBalanced store performance requirements

Page 28: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

Scaling Store Size and Scaling Store Size and BandwidthBandwidth

►Store size scalingStore size scaling System becomes capacity boundSystem becomes capacity bound

►Creates RAID 5 LUNsCreates RAID 5 LUNs

System size scales linearly with store sizeSystem size scales linearly with store size

►Bandwidth scalingBandwidth scaling Ratio of RAID 1/0 to RAID 5 increases Ratio of RAID 1/0 to RAID 5 increases

linearlylinearly

Page 29: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,
Page 30: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

Scaling Number of StoresScaling Number of Stores

►Number of arrays scales linearly with Number of arrays scales linearly with storesstores

Page 31: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

Running timeRunning time

►Quadratic increase with number of storesQuadratic increase with number of stores

Page 32: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

Workload VariabilityWorkload Variability

►Workload attributes randomly taken Workload attributes randomly taken from log-normal distributionfrom log-normal distribution Baseline values = mean distribution Baseline values = mean distribution

valuesvalues

►Capacity utilization drops with Capacity utilization drops with increased variabilityincreased variability

►RAID 5 LUNs increaseRAID 5 LUNs increase►Segmentation increasesSegmentation increases

Page 33: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

Workload varianceWorkload variance

Page 34: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

Whole System ValidationWhole System Validation

►MINERVA vs. Human ExpertMINERVA vs. Human Expert►3 aspects3 aspects

Comparison of resultant system costComparison of resultant system cost Comparison of application performanceComparison of application performance Low runtime and minimal human interactionLow runtime and minimal human interaction

►Based on TPC-D benchmarkBased on TPC-D benchmark Decision Support system based on DB Decision Support system based on DB

queriesqueries►Human designers from HP system Human designers from HP system

benchmarking teambenchmarking team

Page 35: MINERVA: an automated resource provisioning tool for large-scale storage systems G. Alvarez, E. Borowsky, S. Go, T. Romer, R. Becker-Szendy, R. Golding,

Execution TimesExecution Times