how iot, analytics and ml unfolds in storage fabric · 2019 storage developer conference india ©...
TRANSCRIPT
2019 Storage Developer Conference India © All Rights Reserved. 1
How IOT, Analytics and ML unfolds in Storage Fabric
Sharath T SMicrochip
2019 Storage Developer Conference India © All Rights Reserved. 2
Our Map!
Know your sources
Collect the data
Prepare data
Machine Learning
ML based Storage Mgmt.
2019 Storage Developer Conference India © All Rights Reserved. 3
Our itinerary
r Effect of IoT to Data Centersr Effect of IoT in Data Centersr Collection of data from sourcesr Prepare datar Applying ML for different uses casesr Data visualization
2019 Storage Developer Conference India © All Rights Reserved. 4
Effect of IoT to Data Centers
2019 Storage Developer Conference India © All Rights Reserved. 5
Effect of IoT to Data Centers
r 26 B sensors by 2020 and 50 B connected devicesr 5G IoTr Edge computingr Detailed analysisr IoT impact on data-center management
2019 Storage Developer Conference India © All Rights Reserved. 6
Effect of IoT in Data CentersKnow your
sources
2019 Storage Developer Conference India © All Rights Reserved. 7
End points in Data Centers
r EMS (Environmental Monitoring Systems) & ASHRAE (American Society of Heating, Refrigeration, and Air-Conditioning Engineers)r Temperature (18 to 27 °C)r Humidity and water (RH 45% to 60%)r Air flow sensorsr Static electric sensorsr Server room and rack entryr Aisle conditions
2019 Storage Developer Conference India © All Rights Reserved. 8
End points in Data Centers
r Server r Storage controllerr Physical Drives (S.M.A.R.T) r Chassis
2019 Storage Developer Conference India © All Rights Reserved. 9
Collection of data from sources
Know your sources
Collect the data
2019 Storage Developer Conference India © All Rights Reserved. 10
What data to collect?
r SensorrTemperature, humidity, static electric charges,
intrusionr Storage
rSystem, Storage pools, Storage volumes, Drives and Chassis
2019 Storage Developer Conference India © All Rights Reserved. 11
Example Data Collection
r System Informationr CPU Utilizationr Network Utilizationr Memory Utilizationr OS detailsr Uptime
r Storage Poolr Statusr Interfacer Total Sizer Unused Sizer Spare Rebuild moder Volume countr Drive countr Type
r Storage Volumer Statusr Interfacer Total Sizer Unused Sizer Block Sizer RAID Levelr Drive countr Protected by Hot-
Sparer Write-cacher Read-cacher Acceleration method
r Driver Manufacturerr Typer Statusr Interfacer Total Sizer Unused Sizer Reserved Sizer Block Sizer Transfer speedr SMART statsr Bad Blocks
r Storage Controller Information
r Statusr Moder Interfacer Temperature
2019 Storage Developer Conference India © All Rights Reserved. 12
When to collect data?
r Periodic intervalrTime Series analysis and Forecasting
r Event BasedrUser initiated, System initiated
2019 Storage Developer Conference India © All Rights Reserved. 13
How to collect data?r Push mechanism
r Source / System generatedr Breach of any threshold valuesr Listener is required to read and store value
r Pull mechanismr Application / User requestedr On demand / Periodicr Programmatically
2019 Storage Developer Conference India © All Rights Reserved. 14
Prepare data
Know your sources
Collect the data
Prepare data
2019 Storage Developer Conference India © All Rights Reserved. 15
r Types of source datar Unstructuredr Semi-structuredr Structured
r On-Line Analytical Processing (OLAP) of prepared datar Cubesr Dimensions
Data
2019 Storage Developer Conference India © All Rights Reserved. 16
Introduction to Machine Learning
Know your sources
Collect the data
Prepare data
Machine Learning
2019 Storage Developer Conference India © All Rights Reserved. 17
Introduction to Machine Learningr Introduction r Flow
Collecting data
Preparing data
Training the model
Evaluating the model
Improve performanceEvolution
EvolutionDigital Journal
2019 Storage Developer Conference India © All Rights Reserved. 18
Data Feature Selection
2019 Storage Developer Conference India © All Rights Reserved. 19
Data Feature Selection
r Filter method
r Ex: Chi-Square, LDA, ANOVA
Set of all Features
Selecting the best subset
Learning Algorithm Performance
2019 Storage Developer Conference India © All Rights Reserved. 20
Data Feature Selection…
r Wrapper method
r Ex: Forward selection, Backward elimination, Recursive feature elimination, etc.
Set of all Features
Generate subset
Learning Algorithm Performance
Selecting best of Subset
2019 Storage Developer Conference India © All Rights Reserved. 21
Data Feature Selection…
r Embedded method
r Ex: LASSOS, Ridge Regression
Set of all Features
Generate subset
Learning Algorithm Performance
Selecting best of Subset
2019 Storage Developer Conference India © All Rights Reserved. 22
Apply ML for multiple use cases
Know your sources
Collect the data
Prepare data
Machine Learning
ML based Storage Mgmt.
2019 Storage Developer Conference India © All Rights Reserved. 23
Apply ML for multiple use cases…
r Case study - Data center managementr Drive failure predictionr Storage tiering suggestionr Storage usage trendr Storage requirement prediction
2019 Storage Developer Conference India © All Rights Reserved. 24
Drive Failure Prediction
2019 Storage Developer Conference India © All Rights Reserved. 25
Workflow
r What is S.M.A.R.T?r Data setr Periodic collectionr Selection features / attributes r Applying Support-vector-machine to predict drive failurer Outcome
2019 Storage Developer Conference India © All Rights Reserved. 26
Data set (Features)r List of S.M.A.R.T attributes
r Read Error Rater Reallocated Sectors
Countr Spin Retry Countr End-to-End errorr Temperaturer Command Timeoutr Reallocation Event Countr Uncorrectable Sector
Countr Soft ECC Correctionr G-Sense Error Rater Loaded Hours
q Sampling data (Periodic collection)Hours Temp1 . . . . . . ReadErr Servo4
2633 58 6 2944
2635 57 13 2688
2637 56 36 5189
2639 57 0 4032
2641 56 0 8384
. . . .
. . . .
2855 58 14 3322
2857 59 20 2624
2019 Storage Developer Conference India © All Rights Reserved. 27
Prepared Datar Feature selection
Attribute % Good % Failed
Temp1 11.8 48.2
Temp3 35.2 45.3
Temp4 8.8 59.2
Glist 0.5 8.8
ReadError1 0.4 0.8
WriteError 0.8 2.3
Reallocated sector 5.8 30.2
Uncorrectable sector 4.8 34.5
Spin-up time 5.2 14.2
Command timeout 6.2 29.8
2019 Storage Developer Conference India © All Rights Reserved. 28
Machine Learning Modelr Support-vector-machine (supervised learning)
r A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data, the algorithm outputs an optimal hyperplane which categorizes new examples.
2019 Storage Developer Conference India © All Rights Reserved. 29
X X
Y Y
2019 Storage Developer Conference India © All Rights Reserved. 30
X
ZY Y
X X
2019 Storage Developer Conference India © All Rights Reserved. 31
X X X
Y Y Y
2019 Storage Developer Conference India © All Rights Reserved. 32
X
Y
X
Y
2019 Storage Developer Conference India © All Rights Reserved. 33
Machine Learningr Train the model (80%)
Good Drive
Failed DriveSVM Model
r Test the model (20%)
SVM Model
Unlabeled Drive
Performance
Tune parameters
2019 Storage Developer Conference India © All Rights Reserved. 34
Outcome
r Reduce false alarm of failurer Automated policy implementation
2019 Storage Developer Conference India © All Rights Reserved. 35
Storage Tiering Suggestion
2019 Storage Developer Conference India © All Rights Reserved. 36
Storage Tiering
r Physically partitioned into multiple distinct classes based on price, performance or other attributesr Swordfish – ClassOfService
r Data may be dynamically moved among classes within a tiered storage implementation
2019 Storage Developer Conference India © All Rights Reserved. 37
Storage Class (SNIA)r Media class
r High performance SSD/Cache
r High performance HDDr High capacity HDDr Tape
r Data classr Mission criticalr Hotr Cold
r Pricing classr Networked storager DASr Cloud
2019 Storage Developer Conference India © All Rights Reserved. 38
Feature Collectionr Data/Block access frequencyr Last accessedr Last modification timer Size of objectr Encryptionr Drive type (HDD, SSD)r Drive interface typer Drive temperaturer Caching informationr …
2019 Storage Developer Conference India © All Rights Reserved. 39
Machine Learning Model
r k-means (unsupervised learning)r clusteringr centroidr aggregation
2019 Storage Developer Conference India © All Rights Reserved. 40
Machine Learning
Block
Acce
ss fr
eq.
On Cache On SSD On HDD
BlockAc
cess
freq
.Block
Acce
ss fr
eq.
2019 Storage Developer Conference India © All Rights Reserved. 41
Storage Usage Trendr How Storage is used in a data center over a time period.r Time Series Algorithmsr Ex of trends
r Which class of service is more used in futurer Which media class will be more used in future
2019 Storage Developer Conference India © All Rights Reserved. 42
Data Visualization
Know your sources
Collect the data
Prepare data
Machine Learning
ML based Storage Mgmt.
2019 Storage Developer Conference India © All Rights Reserved. 43
Data Visualizationr Tools available
r Power BI, Tableau, Data Dog – commercial r Dash (personal favorite!) – opensource
r Dashr Pure python based frameworkr Abstracts away all technology and protocolr Ideal for building data visualization appsr https://plot.ly/products/dash/
2019 Storage Developer Conference India © All Rights Reserved. 44
An image of Halley's Comet taken in 1986. (Image: © NASA)
2019 Storage Developer Conference India © All Rights Reserved. 45
Fraunhofer lines, in astronomical spectroscopy
2019 Storage Developer Conference India © All Rights Reserved. 46
Thank [email protected]
https://www.linkedin.com/in/sharath-ts-5720a520