spark · 2018-03-24 · summit -1,164 participants from over 453 companies attended -spark training...
TRANSCRIPT
![Page 1: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/1.jpg)
Spark
![Page 2: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/2.jpg)
Spark - Summit - News - Basics - Advanced - Subprojects - Use Cases - Resources
![Page 3: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/3.jpg)
Summit - 1,164 participants from over 453 companies attended - Spark Training sold out at 300 participants - 31 organizations sponsored the event - 12 keynotes and 52 community presentations were given
![Page 4: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/4.jpg)
News - Project - Databricks
![Page 5: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/5.jpg)
Project - 1.0.0 release - Graduated incubator - Very active community
![Page 6: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/6.jpg)
Very active community - Top three Apache projects - Most active Big Data project - > 50 companies - > 250 contributors - > 175,000 LOC
![Page 7: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/7.jpg)
Databricks - Certification - Cloud
![Page 8: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/8.jpg)
Certification - Every certified app will run on every certified distribution - Distribution Partners - App Partners
![Page 9: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/9.jpg)
Distribution Partners - Cloudera - MapR - Hortonworks - Pivotal - IBM - Amazon Web Services - SAP
![Page 10: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/10.jpg)
App Partners - Alteryx - Datastax - 0xdata - Typesafe - Zoomdata
![Page 11: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/11.jpg)
Cloud - Vision: Make Big Data Easy! - Product: Badass - Hosted Platform - Cluster Management - Interactive Workspace
![Page 12: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/12.jpg)
Interactive Workspace - Notebooks - Dashboards - Jobs
![Page 13: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/13.jpg)
Dashboards - WYSIWYG Builder - Interactive plots - One-click publishing
![Page 14: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/14.jpg)
Spark Basics - Execution - RDDs - Caching - Broadcast - Languages
![Page 15: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/15.jpg)
Execution - Apply Functional Operators across Distributed Collections - Master / Worker - Lazy - Parallelize with Threads first
![Page 16: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/16.jpg)
RDDs - Interface for dataset - Backed by anything - Any InputFormat class - HDFS default
![Page 17: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/17.jpg)
Caching - Store intermediate results in memory - Partition-locality - Significant speed-up for iterative algorithms
![Page 18: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/18.jpg)
Broadcast - Send immutable object to all workers - Similar to DistributedCache in mapreduce
![Page 19: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/19.jpg)
Languages - Scala - Python - Java 7 - Java 8 - R - Clojure
![Page 20: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/20.jpg)
Advanced - Partitioning - Persistence Options - Checkpointing - Accumulators - Optimizations
![Page 21: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/21.jpg)
Subprojects - SparkSQL - Tachyon - Spark Streaming - MLLib - GraphX - BlinkDB - Spark Job Server
![Page 22: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/22.jpg)
SparkSQL - Replaces Shark - Core - Catalyst - Libraries
![Page 23: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/23.jpg)
Core - SchemaRDDs - Query Execution - Caching
![Page 24: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/24.jpg)
Catalyst - Relational algebra - Expressions / UDFs - Query Planning - Optimizer
![Page 25: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/25.jpg)
Libraries - POJOs - JDBC - JSON - Parquet - Hive
![Page 26: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/26.jpg)
Hive - Catalog info from Metastore - Helps connect UI like Microstrategy / Tableau - Wrappers for UDF, UDAFs, UDTFs - Supports TRANSFORM - Supports SerDes
![Page 27: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/27.jpg)
Tachyon - In Memory (Off-Heap) Distributed Datastore - Change URI from hdfs:// to tachyon:// - Share datasets between jobs without HDFS - Helps scaling by off-loading allocation responsibility and GC pauses from executor processes
![Page 28: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/28.jpg)
Spark Streaming - Real-time streams - Micro-batching - Windowed Computations - Lambda Architecture
![Page 29: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/29.jpg)
MLLib - Summary statistics - Regression - Classification - Clustering - Collaborative Filtering - Optimization - Dimensional Reduction
![Page 30: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/30.jpg)
GraphX - Graph, VertexRDD, EdgeRDD objects and operations - Pregel API - mapReduceTriplets List<V,E,V> - Graph analytics libraries
![Page 31: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/31.jpg)
Graph analytics libraries - ConnectedComponents - PageRank - TriangleCount - ShortestPaths - SVDPlusPlus
![Page 32: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/32.jpg)
BlinkDB - Get estimated results - Time bound - Error bound
![Page 33: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/33.jpg)
Spark Job Server - Runs multiple jobs / contexts in same process - Allows for RDD Caching / Sharing between jobs - Job Persistence
![Page 34: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/34.jpg)
Use Cases - Spotify - Real-time Auctions - ShareThrough - Real-time Recommendations - Graphflow - Cancer Genomics - AMPLab - Malware Detection - F-Secure - Media Distribution Analytics - NBC Universal - Personal Fitness - Jawbone - Neuroscience - HHMI
![Page 35: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/35.jpg)
Resources - Code - Event - Technology - Videos
![Page 36: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/36.jpg)
Code - https://github.com/apache/spark
![Page 37: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/37.jpg)
Event - spark-summit.org - http://arjon.es/2014/06/30/spark-summit-2014-day-1/ - https://www.crowdchat.net/chat/c3BvdF9vYmpfODc=. - https://nathanbrixius.wordpress.com/2014/07/02/spark-summit-keynote-notes/ - http://thomaswdinsmore.com/2014/07/03/spark-summit-2014-roundup/
![Page 38: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/38.jpg)
Technology - Learning Spark (O'Reilly eBook) - www.spark-stack.org - ampcamp.berkeley.edu - https://amplab.cs.berkeley.edu/2013/10/23/got-a-minute-spin-up-a-spark-cluster-on-your-laptop-with-docker/
![Page 39: Spark · 2018-03-24 · Summit -1,164 participants from over 453 companies attended -Spark Training sold out at 300 participants -31 organizations sponsored the event …](https://reader035.vdocuments.mx/reader035/viewer/2022070802/5f02cf267e708231d4061e2b/html5/thumbnails/39.jpg)
YouTube - AmpLab https://www.youtube.com/channel/UCWudC4d9i-2yxR5tuen-Nuw - Databricks https://www.youtube.com/channel/UC3q8O3Bh2Le8Rj1-Q-_UUbA - Apache Spark https://www.youtube.com/channel/UCRzsq7k4-kT-h3TDUBQ82-w