ë Ü ¼ spark 1 o à - pic.huodongjia.com€¦ · spark 1 2 e • [´ w 1 _ Ä r x Û • §uc...
TRANSCRIPT
![Page 1: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/1.jpg)
Spark
(CISL)
![Page 2: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/2.jpg)
Spark
• • UCBerkeley,AMPLab MateiZaharia • 2010 BSD • 2014 Apache • 863
![Page 3: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/3.jpg)
Spark
• ResilientDistributedDatasets(RDDs)• • /
• RDDs • RDDs map,filter,join,reduce,groupBy,…• RDD lineage • RDDs
![Page 4: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/4.jpg)
Spark
•
vallogs=spark.textFile(“hdfs://…”)
valerrMsgs=lines.map(_.split(“,”)) .filter(_(0)==“ERROR”) .map(_(1))
errMsgs.cache()
errMsgs.filter(_contains“foo”).count()
//header:LEVEL,MSG//INFO,msg1//ERROR,msg2//…//…
Driver
Executor
Executor
RDD
Transforms
tasks
tasks
logs1
logs2
msgs1
msgs2
results
results
![Page 5: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/5.jpg)
Spark
• SparkSQL: • SparkStreaming: • MLlib: • GraphX:
![Page 6: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/6.jpg)
Spark SQL • DataFrames valerrMsgs=sqlCtx.read.format(“csv”)
.load(“hdfs://…”) .where(“LEVEL=‘error’”) .select(“MSG”)
errMsgs.cache()
errMsgs.where(“MSGlike‘foo’”).count()
//header:LEVEL,MSG//INFO,msg1//ERROR,msg2//…//…
![Page 7: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/7.jpg)
• Hadoop Yarn Mesos standalone • HDFS, Cassandra, Azure, S3
Spark
![Page 8: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/8.jpg)
Spark
• AzureDataLake• spark • HDInsight Spark
AzureDataLake
analyWcsservice Clusters(HDInsight)
unstructured semi-structured structured
Store
Analy,cs
YARN
WebHDFS
C#
![Page 9: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/9.jpg)
Spark
• AzureDataLake• spark • HDInsight Spark
• spark • .NET
![Page 10: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/10.jpg)
Bing
• (“FastSML”)• TB•
• • • (OperaWonalIntelligence)• …
![Page 11: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/11.jpg)
Bing --FastSML
Click
UI-Layout
…
Kaea
C C C C
U U U U
… … … …
MergedEvent
RawEvents Databus EventMergePipeline
10-minuteApp-TimeWindow10
Kaea
Databus1 2 3 4
![Page 12: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/12.jpg)
FastSML+Spark?
ApacheStorm(SCP.Net)+Kaea+Microsoh’s
• Spark ?• FastSML ?
• C#
![Page 13: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/13.jpg)
Spark + .NET
.NET • C# Spark • .NET • Spark .NET
Spark + .NET = Mobius!!!
![Page 14: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/14.jpg)
Mobius
• 2015 8 • CISL ASG(Bing) 5 • 2015 11 • MIT
• V1.5.2 V.1.6.0• 4 V1.6.1
![Page 15: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/15.jpg)
Mobius@github • ApacheSparkWiki hmp://github.com/Microsoh/Mobius• 758 ,2 • –131 ,4K
![Page 16: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/16.jpg)
Mobius
C# Spark • • • • •
![Page 17: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/17.jpg)
Word Count
Scala
C#
valtextFile=spark.textFile(“hdfs://…”)valcounts=textFile.flatMap(line=>line.split(“”))
.map(word=>(word,1)) .reduceByKey(_+_)
counts.saveAsTextFile(“hdfs://…”)
vartextFile=sparkContext.textFile(@“hdfs://…”)varcounts=textFile.FlatMap(line=>line.split(“”))
.Map(word=>newKeyValuePair<string,int>(word,1)) .ReduceByKey((x,y)=>x+y) .Map(wc=>string.Format(“{0},{1}”,wc.Key,wc.Value));
counts.saveAsTextFile(@“hdfs://…”);
![Page 18: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/18.jpg)
Mobius
• spark • JVM–CLR(.NETVM)
• Spark JVM• C# CLR
• PySpark SparkR
![Page 19: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/19.jpg)
C#Worker
CLR
IPCSockets
C#Worker
CLR
IPCSockets
C#Worker
CLR
IPCSockets
C#Driver
CLR
IPCSockets
SparkExecutor
SparkExecutor
SparkExecutor
JavaDriver
JVM
JVM
JVM
JVM
Workers
Driver
Method
Result
Method
Method
Method
Result
Result
Result
![Page 20: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/20.jpg)
CSharpRunner
Calledbysparkclr-submit.cmd
JVM
Java/Scalacomponent
C#component
CSharpBackendLaunchesNemyservercreaWngproxyforJVMcalls1
Driver(usercode)LaunchesC#
sub-process
2SqlContext
Init
3
InvokesJVM-methodtocreatecontext
4
SqlContext(Spark)
create 5
createDF
6
InvokesJVM-methodtocreateDF
7
DataFrame(Spark)
Usejsc&createDFinJVM8
DataFrame
9
C#DFhasreferencetoDFinJVM
SqlContexthasreferencetoSCinJVM
12
InvokesmethodonDF
Driver-side Interop - DataFrame
![Page 21: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/21.jpg)
C#Worker
Launchexecutableassub-process
Serializedata&user-implementedC#lambdaandsendthroughsocket
Serializeprocesseddataandsendthroughsocket
SparkExecutorSparkcallsCompute()
Scalacomponent
C#component
Executor-side Interop - RDD
![Page 22: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/22.jpg)
• Driver • SparkR • Nemyserver JVM
• Worker • PySpark • /
![Page 23: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/23.jpg)
Mobius
• SparkExecutor CSharpWorker•
• DataFrame CSharpWorker• SparkCore codegen
• C# / Java
Executor
C#Worker
JavaExecuto
r
C#Worker
JavaExecuto
r
TransformaWon#1 TransformaWon#2
SER/DE SER/DESER SER
![Page 24: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/24.jpg)
Mobius
• Mobius • • •
• Mobius • •
![Page 25: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/25.jpg)
Mobiu
• Spark C# • • •
![Page 26: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/26.jpg)
Mobius
• Roslyn • .NET •
![Page 27: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/27.jpg)
Mobius
• Roslyn • RoslynC#Interpreter
AssemblyService
MobiusSparkCtx
Master
Worker WorkerWorker
SparkCluster
![Page 28: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/28.jpg)
Mobius as a service)
• Spark • • Pay-by-Job
RESTmessageprotocol
Jupyter Zeppelin Shell …
HostedService
MobiusServiceMobiusShell
Master
Worker WorkerWorker
SparkCluster
![Page 29: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/29.jpg)
Mobius
• API
Rou,ngLayer
MobiusServiceEndpoint
MobiusServiceEndpoint
…
Mobius/SparkShell
InterpreterAPI
InterpreterAPI
HiveShell
![Page 30: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/30.jpg)
Mobius
• (ElasWcity) (FaultTolerance)• MobiusServiceendpoints
Rou,ngLayer
MobiusServiceEndpoint
MobiusServiceEndpoint
…Interpreter
APIInterpreter
API
PersistentStore
![Page 31: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/31.jpg)
• Mobius • Mobius Spark • @github.com/Microsoh/Mobius
• AnycontribuWoniswelcome!• :Jupyter/Zeppelin/…• :Puppet/Chef/…• …
![Page 32: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/32.jpg)
(CISL)
• Mobius• Yarn++• ApacheREEF• Rayon• TieredStorage• …
![Page 33: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/33.jpg)
• Mobius • Mobius Spark • @github.com/Microsoh/Mobius
• AnycontribuWoniswelcome!• :Jupyter/Zeppelin/…• :Puppet/Chef/…• …
• CISL • [email protected]
![Page 34: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/34.jpg)
![Page 35: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/35.jpg)
Interactive Log Analysis • MassiveCosmosloganalysis:severalPBsperday• RapiditeraWvedrill-downsforDRIstodiagnoseissues
Customer(orotherDRIs)
AutoPilotWatchdogAlerts
DRI(DesignatedResponsibleIndividual)
PerfCountersExamine1hour,2hours…14days
Architects/Developers/OtherDRIs
RDP—CosmosMachinesOpenindividualconnecWonsfortroubleshooWngVendors/SecondaryDRIs
Eavesdropanddocumentincident
ScopeStudio
AlerWng Triage
35
Spark?
![Page 36: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/36.jpg)
C# API for Spark
ApacheSpark
C#API
Scala/JavaAPI
SparkR PySpark
SparkAppsinC#
![Page 37: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/37.jpg)
Mobius Service Differentiators
• Bemerfaulttolerance--sessionpersistenceandreplay
Rou,ngLayer
MobiusServiceEndpoint
MobiusServiceEndpoint
…PersistentStore
![Page 38: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/38.jpg)
Mobius Service Differentiators
• ElasWcityandFaultTolerance• Auto-scale#ofMobiusServiceendpoints
![Page 39: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/39.jpg)
CSharpRunner
Calledbysparkclr-submit.cmd
JVM
Java/Scalacomponent
C#component
CSharpBackendLaunchesNemyservercreaWngproxyforJVMcalls1
Driver(usercode)LaunchesC#
sub-process
2SqlContext
Init
3
InvokesJVM-methodtocreatecontext
4
SqlContext(Spark)
create 5
createDF
6
InvokesJVM-methodtocreateDF
7
DataFrame(Spark)
Usejsc&createDFinJVM8
10
OperaWonDataFrame
9
C#DFhasreferencetoDFinJVM
11
InvokesJVM-method
SqlContexthasreferencetoSCinJVM
12
InvokesmethodonDF
Driver-side Interop - DataFrame
![Page 40: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/40.jpg)
C#Worker
Launchexecutableassub-process
Serializedata&user-implementedC#lambdaandsendthroughsocket
Serializeprocesseddataandsendthroughsocket
CSharpRDDSparkcallsCompute()
Scalacomponent
C#component
Executor-side Interop - RDD
CSharpRDD is used only when customized C# code is used
in transformation
![Page 41: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/41.jpg)
Mobius Performance Considerations 1. OneCSharpWorkerprocessforeachJVMexecutorprocess
• PySparkforksaPythonprocessforeachtaskthreadinJVMexecutorprocess• NewC#opWonofonethreadforeachtaskthreadinJVMexecutorprocess
2. C#operaWonsarepipelinedwhenpossible• Map&FilterRDDoperaWonsinC#needdatatobepassedfromJVMtoC#,incurringthecostofserializaWonanddeserializaWon
• C#operaWonsarepipelinedwhenpossibletominimizedatapassing
3. DataFrameoperaWonswithoutC#UDFsdonotrequireCSharpWorker• SameexecuWonplanopWmizaWonandcodegeneraWoninSparkCore• PerformthesameasScalaapplicaWons
![Page 42: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/42.jpg)
Mobius Service Differentiators
• ElasWcityandFaultTolerance• Usevirtualactormodel• Auto-scale#ofMobiusServiceendpoints
Rou,ngLayer
MobiusServiceEndpoint
MobiusServiceEndpoint
…PersistentStore
![Page 43: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/43.jpg)
Linux Support
• MonoandCoreCLR(ongoing)forMobiusonLinux• GitHubprojectusesTravisforCIinUbuntu14.04.3LTS
• Unittestsandsamples(funcWonaltests)arerun• [email protected]
![Page 44: ë Ü ¼ Spark 1 O à - pic.huodongjia.com€¦ · Spark 1 2 e • [´ W 1 _ Ä R X Û • §UC Berkeley, AMPLab 1Matei Zaharia ¤ • 2010 òBSD b O à • 2014 n Apache ® F ±](https://reader036.vdocuments.mx/reader036/viewer/2022071218/60533593c3182807a04a2005/html5/thumbnails/44.jpg)
CSharpRDD
• C# CLR • C# => JVM
• RDD<byte[]>• C#worker
• TransformaWonsarepipelinedwhenpossible•