from python to pyspark and back again · from python to pyspark and back again -unifying...
TRANSCRIPT
![Page 1: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/1.jpg)
![Page 2: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/2.jpg)
From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy
Moritz Meister, @morimeisterSoftware Engineer, Logical Clocks
Jim Dowling, @jim_dowlingAssociate Professor, KTH Royal Institute of Technology
![Page 3: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/3.jpg)
ML Model DevelopmentA simplified view
Exploration Experimentation Model TrainingExplainability and Validation ServingFeature
Pipelines
![Page 4: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/4.jpg)
ML Model Development
Exploreand Design
Experimentation: Tune and Search
Model Training(Distributed)
Explainability and Ablation Studies
It’s simple - only four steps
![Page 5: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/5.jpg)
Artifacts and Non DRY Code
Exploreand Design
Experimentation: Tune and Search
Model Training(Distributed)
Explainability and Ablation Studies
![Page 6: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/6.jpg)
What It’s Really Like… not linear but iterative
![Page 7: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/7.jpg)
What It’s Really Really Like… not linear but iterative
![Page 8: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/8.jpg)
Root Cause: Iterative Development of ML Models
Exploreand Design
Experimentation: Tune and Search
Model Training(Distributed)
Explainability and Ablation Studies
![Page 9: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/9.jpg)
Ablation StudiesEDA HParam Tuning Training (Dist)
Iterative Development Is a Pain, We Need DRY Code!Each step requires different implementations of the training code
![Page 10: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/10.jpg)
OBLIVIOUS TRAINING FUNCTION
# RUNS ON THE WORKERS def train():def input_fn(): # return datasetmodel = …optimizer = …model.compile(…)rc = tf.estimator.RunConfig(
‘CollectiveAllReduceStrategy’)keras_estimator = tf.keras.estimator.
model_to_estimator(….)tf.estimator.train_and_evaluate(
keras_estimator, input_fn)
Ablation StudiesEDA HParam Tuning Training (Dist)
The Oblivious Training Function
![Page 11: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/11.jpg)
Challenge: Obtrusive Framework Artifacts
▪ TF_CONFIG▪ Distribution Strategy▪ Dataset (Sharding, DFS)▪ Integration in Python - hard from inside a notebook▪ Keras vs. Estimator vs. Custom Training Loop
Example: TensorFlow
![Page 12: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/12.jpg)
Where is Deep Learning headed?
![Page 13: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/13.jpg)
Productive High-Level APIsOr why data scientists love Keras and PyTorch
Idea
Experiment
ResultsInfrastructure
Framework
TrackingVisualization
Francois Chollet, “Keras: The Next 5 Years”
![Page 14: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/14.jpg)
Productive High-Level APIsOr why data scientists love Keras and PyTorch
Idea
Experiment
ResultsInfrastructure
Framework
TrackingVisualization
Francois Chollet, “Keras: The Next 5 Years”
? Hopsworks (Open Source)DatabricksApache SparkCloud Providers
![Page 15: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/15.jpg)
How do we keep our high-level APIs transparent and productive?
![Page 16: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/16.jpg)
What Is Transparent Code?
def dataset(batch_size):(x_train, y_train) = load_data()x_train = x_train / np.float32(255)y_train = y_train.astype(np.int64)train_dataset = tf.data.Dataset.from_tensor_slices((x_train,y_train)).shuffle(60000).repeat().batch(batch_size)
return train_dataset
def build_and_compile_cnn_model(lr):model = tf.keras.Sequential([
tf.keras.Input(shape=(28, 28)),tf.keras.layers.Conv2D(32, 3, activation='relu'),tf.keras.layers.Flatten(),tf.keras.layers.Dense(128, activation='relu'),tf.keras.layers.Dense(10)
])model.compile(
loss=SparseCategoricalCrossentropy(from_logits=True),optimizer=SGD(learning_rate=lr))
return model
def dataset(batch_size):(x_train, y_train) = load_data()x_train = x_train / np.float32(255)y_train = y_train.astype(np.int64)train_dataset = tf.data.Dataset.from_tensor_slices((x_train,y_train)).shuffle(60000).repeat().batch(batch_size)
return train_dataset
def build_and_compile_cnn_model(lr):model = tf.keras.Sequential([
tf.keras.Input(shape=(28, 28)),tf.keras.layers.Conv2D(32, 3, activation='relu'),tf.keras.layers.Flatten(),tf.keras.layers.Dense(128, activation='relu'),tf.keras.layers.Dense(10)
])model.compile(
loss=SparseCategoricalCrossentropy(from_logits=True),optimizer=SGD(learning_rate=lr))
return model
NO CHANGES!
![Page 17: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/17.jpg)
Building Blocks for Distribution Transparency
![Page 18: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/18.jpg)
Distribution ContextSingle-host vs. parallel multi-host vs. distributed multi-host
Worker 1
Worker 5
Worker 3
Worker 2
Worker 4
Worker 7
Worker 8
Worker 6
DriverTF_CONFIG
DriverExperiment Controller
Worker 1 Worker NWorker 2
Single Host
![Page 19: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/19.jpg)
Distribution ContextSingle-host vs. parallel multi-host vs. distributed multi-host
Worker 1
Worker 5
Worker 3
Worker 2
Worker 4
Worker 7
Worker 8
Worker 6
DriverTF_CONFIG
DriverExperiment Controller
Worker 1 Worker NWorker 2
Single Host
Exploreand Design
Experimentation: Tune and Search
Model Training(Distributed)
Explainability and Ablation Studies
![Page 20: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/20.jpg)
Model Development Best Practices
▪ Modularize▪ Parametrize▪ Higher order training
functions▪ Usage of callbacks at
runtime
DatasetGeneration
Model Generation
TrainingLogic
![Page 21: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/21.jpg)
Oblivious Training Function as an AbstractionLet the system handle the complexities
System takes care of ...
… fixing parameters… launching
the function
… launching trials (parametrized instantiations of the function)
… generating new trials… collecting and logging results
… setting up TF_CONFIG… wrapping in Distribution Strategy… launching function as workers… collecting results
![Page 22: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/22.jpg)
Maggy
Spark+AI Summit 2019
TodayWith Hopsworks and Maggy, we provide a unified development and execution environment for distribution transparent ML model development.
Make the Oblivious Training Function a core abstraction on Hopsworks
![Page 23: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/23.jpg)
Hopsworks - Award Winning Plattform
![Page 24: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/24.jpg)
Recap: Maggy - Asynchronous Trials on SparkSpark is bulk-synchronous
WastedCompute
WastedCompute
HopsFS
Barrier
Task11
Task12
Task13
Task1N
Driver
Metrics1
Barrier
Task21
Task22
Task23
Task2N
Metrics2
BarrierTask31
Task32
Task33
Task3N
Metrics3
WastedCompute
Early-Stopping
![Page 25: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/25.jpg)
Recap: The SolutionAdd Communication and Long Running Tasks
Task11
Task12
Task13
Task1N
Driver
Barrier
Metrics New Trial
![Page 26: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/26.jpg)
What’s New?Worker discovery and distribution context set-up
Task11
Task12
Task13
Task1N
Driver
Barrier
Launch Oblivious Training Function in Context
Discover Workers
![Page 27: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/27.jpg)
What’s New: Distribution Context
sp = maggy.optimization.Searchspace(...)dist_strat = tf.keras.distribute.MirroredStrategy(...)
ab = maggy.ablation.AblationStudy(...)
maggy.set_context('optimization’)maggy.lagom(training_function, sp)
maggy.set_context(‘distributed_training’)maggy.lagom(training_function, dist_strat)
maggy.set_context(‘ablation’)maggy.lagom(training_function, ab)
![Page 28: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/28.jpg)
DEMO
![Page 29: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/29.jpg)
What’s Next
Extend the platform to provide a unified development and execution environment for distribution transparent Jupyter Notebooks.
![Page 30: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/30.jpg)
Summary
▪ Moving between distribution contexts requires code rewriting▪ Factor out obtrusive framework artifacts▪ Let system handle distribution context▪ Keep productive high-level APIs
![Page 31: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/31.jpg)
Thank You!
Get Startedhopsworks.aigithub.com/logicalclocks/maggy
Twitter@morimeister@jim_dowling@logicalclocks@hopsworks
Webwww.logicalclocks.com
Contributions from colleagues▪ Sina Sheikholeslami▪ Robin Andersson▪ Alex Ormenisan▪ Kai Jeggle
Thanks to the Logical Clocks Team!
![Page 32: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/32.jpg)
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.
![Page 33: From Python to PySpark and Back Again · From Python to PySpark and Back Again -Unifying Single-host and Distributed Machine Learning with Maggy Moritz Meister, @morimeister Software](https://reader033.vdocuments.mx/reader033/viewer/2022043013/5faf23ee0dc4ba7bf325dda9/html5/thumbnails/33.jpg)