spark streaming
TRANSCRIPT
![Page 1: Spark Streaming](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55be3229bb61eb6c498b463f/html5/thumbnails/1.jpg)
www.edureka.co/apache-spark-scala-training
Spark Streaming
Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
![Page 2: Spark Streaming](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55be3229bb61eb6c498b463f/html5/thumbnails/2.jpg)
Slide 2Slide 2 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
What is Big Data?
What is Spark?
Why Spark?
Spark Ecosystem
Spark Features
Scala overview
Spark Streaming Demo
For Queries during the session and class recording:Post on Twitter @edurekaIN: #askEdurekaPost on Facebook /edurekaIN
Objectives of this Session
![Page 3: Spark Streaming](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55be3229bb61eb6c498b463f/html5/thumbnails/3.jpg)
Slide 3Slide 3 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
What is Big Data
![Page 4: Spark Streaming](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55be3229bb61eb6c498b463f/html5/thumbnails/4.jpg)
Slide 4Slide 4 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
What is Big Data
![Page 5: Spark Streaming](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55be3229bb61eb6c498b463f/html5/thumbnails/5.jpg)
Slide 5Slide 5 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Big Data
Lots of Data (Terabytes or Petabytes)
Big data is the term for a collection of data setsso large and complex that it becomes difficult toprocess using on-hand database managementtools or traditional data processing applications
The challenges include capture, curation,storage, search, sharing, transfer, analysis, andvisualization
cloud
tools
statistics
No SQL
compression
storage
support
database
analyze
information
terabytes
processing
mobile
Big Data
![Page 6: Spark Streaming](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55be3229bb61eb6c498b463f/html5/thumbnails/6.jpg)
Slide 6Slide 6 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
What is Big Data
![Page 7: Spark Streaming](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55be3229bb61eb6c498b463f/html5/thumbnails/7.jpg)
Slide 7Slide 7 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
What is Big Data
![Page 8: Spark Streaming](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55be3229bb61eb6c498b463f/html5/thumbnails/8.jpg)
Slide 8Slide 8 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
What is Spark?
Apache Spark is a general-purpose cluster in-memory computing system
Provides high-level APIs in Java, Scala and Python, and an optimized engine that supports general execution graphs
Provides various high level tools like Spark SQL for structured data processing, Mlib for Machine Learning and more..
High Level APIs
High Level Tools
More…
![Page 9: Spark Streaming](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55be3229bb61eb6c498b463f/html5/thumbnails/9.jpg)
Slide 9Slide 9 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Why Spark?
Cluster Manager
Deployment
via YARN
The Spark framework can be deployed through Apache Mesos, Apache Hadoop via Yarn, or Spark’s own cluster manager.
![Page 10: Spark Streaming](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55be3229bb61eb6c498b463f/html5/thumbnails/10.jpg)
Slide 10Slide 10 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Why Spark?
Polyglot Scala
Spark framework is polyglot – Can be programmed in several programming languages (Currently Scala, Java and Python supported).
![Page 11: Spark Streaming](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55be3229bb61eb6c498b463f/html5/thumbnails/11.jpg)
Slide 11Slide 11 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Why Spark?
Provides powerful caching and disk persistence capabilities
Interactive Data Analysis
Faster Batch
Iterative Algorithms
Real-Time Stream Processing
Faster Decision-Making
![Page 12: Spark Streaming](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55be3229bb61eb6c498b463f/html5/thumbnails/12.jpg)
Slide 12Slide 12 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Spark Community is Super Active!
![Page 13: Spark Streaming](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55be3229bb61eb6c498b463f/html5/thumbnails/13.jpg)
Slide 13Slide 13 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Spark Ecosystem
Spark Core Engine
Aplha/Pre-alpha
Shark (SQL)
SparkStreaming(Streaming)
MLLib(Machine learning)
GraphX(Graph
Computation)
SparkR(R on Spark)
BlindDB(Approximate
SQL)
![Page 14: Spark Streaming](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55be3229bb61eb6c498b463f/html5/thumbnails/14.jpg)
Slide 14Slide 14 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Spark Ecosystem (Contd.)
Used for structured data. Can run unmodified hive queries on existing Hadoop deployment.
Spark Core Engine
Aplha/Pre-alpha
Shark (SQL)
SparkStreaming(Streaming)
MLLib(Machine learning)
GraphX(Graph
Computation)
SparkR(R on Spark)
BlindDB(Approximate
SQL)
Enables analytical and interactive apps for live streaming data.
An approximate query engine. To run over Core Spark Engine.
Graph Computation engine.(Similar to Giraph)
Package for R language to enable R-users to leverage Spark power from R shell.
Machine learning library being built on top of Spark. Provision for support to many machine learning algorithms with speeds upto 100 times faster than Map-Reduce.
![Page 15: Spark Streaming](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55be3229bb61eb6c498b463f/html5/thumbnails/15.jpg)
Slide 15Slide 15 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Spark Ecosystem (Contd.)
![Page 16: Spark Streaming](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55be3229bb61eb6c498b463f/html5/thumbnails/16.jpg)
Slide 16Slide 16 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Spark Functional Features
![Page 17: Spark Streaming](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55be3229bb61eb6c498b463f/html5/thumbnails/17.jpg)
Slide 17Slide 17 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Spark Non-functional Features
![Page 18: Spark Streaming](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55be3229bb61eb6c498b463f/html5/thumbnails/18.jpg)
Slide 18Slide 18 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
![Page 19: Spark Streaming](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55be3229bb61eb6c498b463f/html5/thumbnails/19.jpg)
Slide 19Slide 19 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Introduction to Scala
![Page 20: Spark Streaming](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55be3229bb61eb6c498b463f/html5/thumbnails/20.jpg)
Slide 20Slide 20 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Introduction to Scala
![Page 21: Spark Streaming](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55be3229bb61eb6c498b463f/html5/thumbnails/21.jpg)
Slide 21Slide 21 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Scala frameworks
![Page 22: Spark Streaming](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55be3229bb61eb6c498b463f/html5/thumbnails/22.jpg)
Slide 22Slide 22 www.edureka.co/apache-spark-scala-trainingTwitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Why Scala?
![Page 23: Spark Streaming](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55be3229bb61eb6c498b463f/html5/thumbnails/23.jpg)
Slide 23 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in
Demo : Spark Streaming
Write a Spark streaming program, which counts the number of linescontaining the word “FATAL” and keeps reporting it on console.
![Page 24: Spark Streaming](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55be3229bb61eb6c498b463f/html5/thumbnails/24.jpg)
Slide 24
Questions?
Buy Spark Course at : www.edureka.co
Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
![Page 25: Spark Streaming](https://reader036.vdocuments.mx/reader036/viewer/2022062320/55be3229bb61eb6c498b463f/html5/thumbnails/25.jpg)