vegas: the missing matplotlib for scala/apache spark with roger menezes and db tsai
TRANSCRIPT
![Page 1: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/1.jpg)
VegasThe Missing Matplotlib for Scala/Spark
DB TsaiRoger Menezes
![Page 2: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/2.jpg)
Homepage Kids Page Downloads Page
Netflix Recommendations
Every aspect of the Experience is MachineLearned
![Page 3: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/3.jpg)
3
2017> 100M members> 190 countries
![Page 4: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/4.jpg)
Multiple Devices
![Page 5: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/5.jpg)
Genres: 23 rows/page average
Sims: 10 rows/page average
![Page 6: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/6.jpg)
My List:
Continue Watching:
Popular on Netflix:
Trending Now:
Watch It Again:
Top Picks:
Because You Watched:
Genres:
New Releases:
Recently Added:
Originals RowBillboard:
![Page 7: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/7.jpg)
ML at Netflix
● Optimize the Experimentation usecase vs Productionization● Experimentation
○ Opportunity sizing, Data Exploration○ Tweaks to ML algos○ Feature Selection○ Model Evaluation
![Page 8: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/8.jpg)
Notebooks
● Optimal for Experimentation● Sharing reproducible research
○ Facilitates feedback loop with PMs● End to end ML experiment.
○ Interactivity drives productivity
![Page 9: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/9.jpg)
Python Notebooks
![Page 10: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/10.jpg)
Python Notebooks
● Seamless Experience - ML experimentation● Well known Scientific computing libraries● Huge catalog of Visualization plotting libraries
○ Matplotlib, Seaborn, Bokeh, BQPlot, Lightning, etc.
![Page 11: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/11.jpg)
Scala Notebooks● Zeppelin, Jupyter, Databricks, Spark-Notebooks, ...● Computing library gap filling up● Lack of Visualization Libraries
○ Main friction point in adoption○ End to End ML use case not convincing
![Page 12: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/12.jpg)
Introducing Vegas● Visualization Library in Scala● Mainly built for the notebook use case● Scala wrapper around Vega-Lite● Missing MatPlotLib for the Scala and Spark world.
![Page 13: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/13.jpg)
VegaLite● Statistical Visualization● Design considerations for vega-lite
○ Imperative vs Declarative API
![Page 14: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/14.jpg)
DECLARATIVE
STATISTICAL
VISUALIZATION GRAMMAR
IN SCALA
You tell it WHAT should be done with the data, and it knowsHOW to do it!
Operations such as filtering, aggregation, faceting are built into the visualization, rather than putting the burden on the user to massage the data into shape.
Complex visualizations can be built with a few high level abstractions:
DATA TRANS-FORMS SCALES GUIDES MARKS
cf : Altair Talk by Brian Granger in PyData 2016 https://youtu.be/v5mrwq7yJc4
![Page 15: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/15.jpg)
Added Bonus of Declarative Visualizations:
INTERACTIVITY!
D3JS
VEGAS
VEGAS CODE EXPANDS OUT TO D3JS CODE!
![Page 16: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/16.jpg)
Anatomy of a plot: Channels
X/Y channel
Shape Channel
Size Channel
Color Channel
![Page 17: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/17.jpg)
Features…
![Page 18: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/18.jpg)
1. Supports most plot types
![Page 19: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/19.jpg)
2. Trellis plots
![Page 20: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/20.jpg)
3. Layers
Layer 1.
Layer 2.
Layer 3.
![Page 21: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/21.jpg)
4. Notebook and Consoles
![Page 22: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/22.jpg)
5. Built-in spark support
Vegas .withDataFrame(myDataFrame) .encodeX(“population”) .encodeY(“age”)
Mapped Columns
Pass In DF.
![Page 23: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/23.jpg)
6. Visual statistics
● Advanced Binning
● Sorting
● Scaling
● Custom Transforms
● Time Series
● Aggregation
● Filtering
● Math functions (log, etc)
● Missing data support
● Descriptive Statistics
![Page 24: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/24.jpg)
How It Works !
![Page 25: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/25.jpg)
1. Specify in Scala
2. Embed HTML (iFrame)
3. Render within iFrame using JS
![Page 26: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/26.jpg)
VEGA
D3JS
VEGA-LITE*
VEGAS
MORE ABS
TRACT
ION
SCALA DSL EMITS TYPE-CHECKED VEGA-LITE JSON
VEGA-LITE CONVERTS INTERNALLY TO VEGA JSON SPEC
VEGA TRANSLATES JSON TO D3JS CODE THAT CAN BE VERY VERBOSE
A SCALA DSL FOR VEGA-LITE
* Vega-Lite
![Page 28: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/28.jpg)
What’s coming
![Page 29: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/29.jpg)
1. Interactive selections
![Page 30: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/30.jpg)
2. Selections transforms
![Page 31: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/31.jpg)
Contributors
![Page 32: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/32.jpg)
Thank you.
![Page 33: VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and DB Tsai](https://reader034.vdocuments.mx/reader034/viewer/2022051504/5a1532b57f8b9a65768b47b1/html5/thumbnails/33.jpg)
@NetflixResearch@rogermenezes @dbtsai
The missing MatPlotLib for Scala/Spark
http://vegas-viz.org