spark architecture - github pages · spark is fault tolerant. if a slave machine crashes, it's...

Spark architecture Spark architecture

Upload: others

Post on 12-Aug-2020

4 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Spark architecture - GitHub Pages · Spark is fault tolerant. If a slave machine crashes, it's RDD's will be recomputed. If hours of computation have been completed before the crash,

Spark architectureSpark architecture

Page 2: Spark architecture - GitHub Pages · Spark is fault tolerant. If a slave machine crashes, it's RDD's will be recomputed. If hours of computation have been completed before the crash,

In local installation, cores serve as master & slaves

Hardware organizationHardware organization

Page 3: Spark architecture - GitHub Pages · Spark is fault tolerant. If a slave machine crashes, it's RDD's will be recomputed. If hours of computation have been completed before the crash,

CommunicationCommunication

ShShufufflefle

Same machines are used for both map and reduce (decreasescommunication but only slightly)Communication between slaves is the toughest bottleneck.Design your computation to minimize communication.

Page 4: Spark architecture - GitHub Pages · Spark is fault tolerant. If a slave machine crashes, it's RDD's will be recomputed. If hours of computation have been completed before the crash,

spatial software organizationspatial software organization

The driver runs on themasterIt executes the "main()"code of your program.

The Cluster Mastermanages thecomputation resources.Mesos and Yarn areresource managementprograms for clusters.

Workers run on the slaves (usually one per core)

Each RDD is partitioned among the workers,

Workers manage partitions and Executors

Executors execute tasks on their partition, are myopic.

Page 5: Spark architecture - GitHub Pages · Spark is fault tolerant. If a slave machine crashes, it's RDD's will be recomputed. If hours of computation have been completed before the crash,

spatial organizationspatial organization

(more detail)(more detail)

SparkContext (sc) is the abstraction that encapsulates the cluster forthe driver node (and the programmer).Worker nodes manage resources in a single slave machine.Worker nodes communicate with the cluster manager.Executors are the processes that can perform tasks.Cache refers to the local memory on the slave machine.

Page 6: Spark architecture - GitHub Pages · Spark is fault tolerant. If a slave machine crashes, it's RDD's will be recomputed. If hours of computation have been completed before the crash,

RDD Processing RDD Processing

RDDs, by default, are not materializedThey do materialize if cached orotherwise persisted.

Page 7: Spark architecture - GitHub Pages · Spark is fault tolerant. If a slave machine crashes, it's RDD's will be recomputed. If hours of computation have been completed before the crash,

Temporal organizationTemporal organization

RDD Graph and Physical planRDD Graph and Physical plan

Recall Spatial organization

A stage ends

when the RDD needs

to be materialized

Page 8: Spark architecture - GitHub Pages · Spark is fault tolerant. If a slave machine crashes, it's RDD's will be recomputed. If hours of computation have been completed before the crash,

Terms and concepts of executionTerms and concepts of execution

RDDs are partitioned across workers, each workermanages a one partition of each RDD.RDD graph defines the Lineage of the RDDs.SparkContext divides the RDD graph into stages whichdefines the execution plan (or physical plan)A task corresponds to the to one stage, restricted to onepartition.An executor is a process that can perform tasks.

Page 9: Spark architecture - GitHub Pages · Spark is fault tolerant. If a slave machine crashes, it's RDD's will be recomputed. If hours of computation have been completed before the crash,

PersistancePersistanceand Checkpointingand Checkpointing

Page 10: Spark architecture - GitHub Pages · Spark is fault tolerant. If a slave machine crashes, it's RDD's will be recomputed. If hours of computation have been completed before the crash,

Levels of persistanceLevels of persistanceCaching is useful for retaining intermediate resultsOn the other hand, caching can consume a lot of memoryIf memory is exhausted, caches can be eliminated, spilledto disk etc.If needed again, cache is recomputed or read from disk.The generalization of .cache() is called .persist() which hasmany options.

Page 11: Spark architecture - GitHub Pages · Spark is fault tolerant. If a slave machine crashes, it's RDD's will be recomputed. If hours of computation have been completed before the crash,

Storage LevelsStorage Levels.cache() same as .persist(MEMORY_ONLY)

Page 12: Spark architecture - GitHub Pages · Spark is fault tolerant. If a slave machine crashes, it's RDD's will be recomputed. If hours of computation have been completed before the crash,

CheckpointingCheckpointingSpark is fault tolerant. If a slave machine crashes, it'sRDD's will be recomputed.If hours of computation have been completed before thecrash, all the computation needs to be redone.Checkpointing reduces this problem by storing thematerialized RDD on a remote disk.On Recovery, the RDD will be recovered from the disk.It is recommended to cache an RDD before checkpointingit.

Spark SQL | Apache Spark

[Spark meetup] Spark Streaming Overview

Spark Plug Thread Repair Spark Plug Spark Plug Sockets for

MODEL 917.257632 · 2015-02-28 · witha spark arrester meeting applicable local or state laws (ifany). If a spark arrester isused, itshould be maintained in effectiveworkingorder

MODEL NUMBER 917.259551 OWNER'SMANUAL · with a spark arrester meeting applicable local or state laws (if any)_ If a spark arrestor is used, it should be maintained in effective working

What is SPARK? - UHgabriel/courses/cosc6339_s17/BDA_11_Spark.pdf · What is SPARK? •In-Memory Cluster ... –Hadoop, –Mesos, •Spark ... Spark Essentials •Spark program has

OM, HD145H42G, 1997-08, TRACTORS/RIDE MOWERSis equipped with a spark arrester meeting applicable local or state laws (if any). If a spark arrester is used, it should be maintained

Spark Architecture · Spark Architecture Spark Shuffle ... Spark Shuffle Spark DataFrames . ... – Entry point of the Spark Shell (Scala, Python, R) – The place where SparkContext

Spark Plug Thread Repair Spark Plug Spark Plug Sockets for Ford

Single-View X-ray Depth Recoverycampar.in.tum.de/pub/albarqouni2016ipcai/albarqouni2016ipcai... · Training Data (DRR + DM) Testing Image Recomputed from the compact dictionary Meta

NewTek Spark Plus IOa6ce85f34b101e4ba428-38e91d4533ffbe5c8042650a77a3ed34.r56...If you connected a video source in step 4 above, your Spark Plus is already sending NDI audio and video

MODEL NUMBER 917.258872 OWNER'S MANUAL · with a spark arrester meeting applicable local or state laws (if any).. If a spark at'rester is used, it should be maintained in effective