![Page 1: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is](https://reader035.vdocuments.mx/reader035/viewer/2022071417/61150d09c4a48b42ab5bf562/html5/thumbnails/1.jpg)
Infrastructures for machine learning
Antonio Cisternino (@cisterni)IT Center @Unipisa
![Page 2: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is](https://reader035.vdocuments.mx/reader035/viewer/2022071417/61150d09c4a48b42ab5bf562/html5/thumbnails/2.jpg)
Machine Learning hype in the news
![Page 3: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is](https://reader035.vdocuments.mx/reader035/viewer/2022071417/61150d09c4a48b42ab5bf562/html5/thumbnails/3.jpg)
Machine Learning hype in the news
![Page 4: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is](https://reader035.vdocuments.mx/reader035/viewer/2022071417/61150d09c4a48b42ab5bf562/html5/thumbnails/4.jpg)
Machine Learning hype in the news
![Page 5: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is](https://reader035.vdocuments.mx/reader035/viewer/2022071417/61150d09c4a48b42ab5bf562/html5/thumbnails/5.jpg)
The second machine age: the automation of control
World transformation
Energy
Control
![Page 6: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is](https://reader035.vdocuments.mx/reader035/viewer/2022071417/61150d09c4a48b42ab5bf562/html5/thumbnails/6.jpg)
Generating tables
We will initially need trainedmathematicians writing tables (i.e.
computer programs) but eventuallycomputers will generate tables (programs)
automatically
Alan Turing working on Colossus (not literal citation)
![Page 7: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is](https://reader035.vdocuments.mx/reader035/viewer/2022071417/61150d09c4a48b42ab5bf562/html5/thumbnails/7.jpg)
Percept
Update envinformation
Makedecision
Act
A «functional» element
![Page 8: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is](https://reader035.vdocuments.mx/reader035/viewer/2022071417/61150d09c4a48b42ab5bf562/html5/thumbnails/8.jpg)
A «functional» element
Perception
Analysis
Classification/predictionAction
Feedback
![Page 9: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is](https://reader035.vdocuments.mx/reader035/viewer/2022071417/61150d09c4a48b42ab5bf562/html5/thumbnails/9.jpg)
Cloud, HPC and the problem size
HPC
Problem requiresmultiple nodes
Homogeneousstructure (jobs
and schedulers…)
Mainly in memory
Frequent sync(order of latency
in communications)
Cloud
Single nodeaddressingmultiple problems
Heterogeneousstructure (just
x86…)
Any memory and disk accesspatterns
Communicationmay happen even
through L2 incapsulated in L3
Big data
Problemrequiresmultiple nodes
Not-so-homogeneous
structure
Mainly disk bound
Communicationis rare
Machine learning
Not (yet) a single system
Highly variablestructure
Highly variablemodel size
It may benefit from multiple
nodes
![Page 10: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is](https://reader035.vdocuments.mx/reader035/viewer/2022071417/61150d09c4a48b42ab5bf562/html5/thumbnails/10.jpg)
(Traditional?) Machine Learning process
Acquiredata
Training set / Test
setTraining
Model validation
Production of the model
Data and Computationally intensive
Model small wrt to data
Fast execution
![Page 11: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is](https://reader035.vdocuments.mx/reader035/viewer/2022071417/61150d09c4a48b42ab5bf562/html5/thumbnails/11.jpg)
Everyone looks for adaptive systems
Reward/punishment
Update the model
![Page 12: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is](https://reader035.vdocuments.mx/reader035/viewer/2022071417/61150d09c4a48b42ab5bf562/html5/thumbnails/12.jpg)
Deep learning ≠ Machine learning
«Future is deep learning and systems will learn like human brain»
«Accelerators are needed for efficient DNN»
«Half precision is the solutionto everything»
ML techniques are activelyused that are not based on DNN and may include treesstructures
HW acceleration may help speed depending on the sizeof the ML function
Sometimes filtering training data using less information may help generalization butit’s not always true
Belie
fsR
eality ch
eck
![Page 13: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is](https://reader035.vdocuments.mx/reader035/viewer/2022071417/61150d09c4a48b42ab5bf562/html5/thumbnails/13.jpg)
Compute is important in ML
Ok, ok, you may need
memory and
computational power
in some format and
shape
![Page 14: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is](https://reader035.vdocuments.mx/reader035/viewer/2022071417/61150d09c4a48b42ab5bf562/html5/thumbnails/14.jpg)
Data persistence is important in ML
ML Model is important Often hard or impossible to recreate (especially in adaptive systems) Should be fast to access
![Page 15: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is](https://reader035.vdocuments.mx/reader035/viewer/2022071417/61150d09c4a48b42ab5bf562/html5/thumbnails/15.jpg)
Fabric is important for ML
ML is a functional element that has to be placed in a larger computationalinfrastructure
Bandwidth is important to support data ingestion and output Latency may be even more important in production
![Page 16: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is](https://reader035.vdocuments.mx/reader035/viewer/2022071417/61150d09c4a48b42ab5bf562/html5/thumbnails/16.jpg)
Cloud is important for ML
Some ML primitives are simply too big to be executed on prem Users are a key part of on-line learning and adaptive systems
![Page 17: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is](https://reader035.vdocuments.mx/reader035/viewer/2022071417/61150d09c4a48b42ab5bf562/html5/thumbnails/17.jpg)
Is this architecture suitable for ML?
Bandwidth is limited from
the architecture
![Page 18: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is](https://reader035.vdocuments.mx/reader035/viewer/2022071417/61150d09c4a48b42ab5bf562/html5/thumbnails/18.jpg)
Is this architecture suitable for ML?
Bandwidth is limited from
the architecture
![Page 19: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is](https://reader035.vdocuments.mx/reader035/viewer/2022071417/61150d09c4a48b42ab5bf562/html5/thumbnails/19.jpg)
A new dawn of computing
Computation
Processing
Storage
Communication
Programming
Disks are
getting only
100x slower
than
memory
Software & PL
are getting
distant from
real
architectures
Many core, accelerators
Low latency high
bandwidth
![Page 20: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is](https://reader035.vdocuments.mx/reader035/viewer/2022071417/61150d09c4a48b42ab5bf562/html5/thumbnails/20.jpg)
A more reasonable interconnection…
Cloud
Front
end
HPC
Big Data
Private cloud
f
f
Edge (IoT and mobile)
f
f
f
![Page 21: Infrastructures for machine learning - Dell Technologies · 2020. 8. 23. · Big data Problemrequires multiple nodes Not-so-homogeneous structure Mainly disk bound Communication is](https://reader035.vdocuments.mx/reader035/viewer/2022071417/61150d09c4a48b42ab5bf562/html5/thumbnails/21.jpg)
(My) Conclusions ML is a functional primitive not a system The discipline is evolving and the balance between compute/storage/fabric may vary
significantly over problems and time Hardware acceleration will be an important part and reconfiguration of hardware will
be more and more important Edge ML will be part of the picture of Fog computing ML should not be considered as a physical entity (like a cluster) in your data center