mongodb & spark
TRANSCRIPT
![Page 1: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/1.jpg)
MongoDB + Spark@blimpyacht
![Page 2: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/2.jpg)
Level Setting
![Page 3: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/3.jpg)
![Page 4: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/4.jpg)
![Page 5: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/5.jpg)
![Page 6: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/6.jpg)
TROUGH OF DISILLUSIONMENT
![Page 7: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/7.jpg)
HDFS
Distributed Data
![Page 8: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/8.jpg)
HDFS
YARN
Distributed Resources
![Page 9: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/9.jpg)
HDFS
YARN
MapReduce
Distributed Processing
![Page 10: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/10.jpg)
HDFSYARN
Hive
Pig
Domain Specific Languages
MapReduce
![Page 11: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/11.jpg)
Interactive Shell
Easy (-er)Caching
![Page 12: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/12.jpg)
HDFS
Distributed Data
![Page 13: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/13.jpg)
HDFS
YARN
Distributed Resources
![Page 14: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/14.jpg)
HDFSYARN
SparkHadoop
Distributed Processing
![Page 15: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/15.jpg)
HDFSYARN
Spark
Hadoop
Mesos
![Page 16: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/16.jpg)
HDFSStand Alone
YARN
Spark
Hadoop
Mesos
![Page 17: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/17.jpg)
HDFSStand AloneYARN
SparkHadoop
Mesos
Hive
Pig
![Page 18: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/18.jpg)
HDFSStand Alone
YARN
SparkHadoop
Mesos
Hive
Pig
SparkShell
![Page 19: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/19.jpg)
HDFSStand Alone
YARN
SparkHadoop
Mesos
Hive
Pig
SparkShell
SparkStreaming
![Page 20: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/20.jpg)
HDFS
Stand AloneYAR
N
SparkHadoop
Mesos
Hive
Pig
SparkSQL
SparkShell
SparkStreaming
![Page 21: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/21.jpg)
HDFS
Stand Alone
YARN
Spark
Hadoop
Mesos
Hive
Pig
SparkSQL
SparkShell
SparkStreaming
![Page 22: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/22.jpg)
HDFS
Stand Alone
YARN
Spark
Hadoop
Mesos
Hive
Pig
SparkSQL
SparkShell
SparkStreaming
![Page 23: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/23.jpg)
Stand Alone
YARN
Spark
Hadoop
Mesos
Hive
Pig
SparkSQL
SparkShell
SparkStreaming
![Page 24: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/24.jpg)
SparkStreaming
Hive
SparkShell
MesosHado
op
Pig
SparkSQL
Spark
Stand Alone
YARN
![Page 25: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/25.jpg)
Stand AloneYAR
N
SparkMesos
SparkSQL
SparkShell
SparkStreaming
![Page 26: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/26.jpg)
Stand Alone
YARN
SparkMesos
SparkSQL
SparkShell
SparkStreaming
![Page 27: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/27.jpg)
executor
Worker Node
executor
Worker Node
Driver
Resilient Distributed Datasets
![Page 28: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/28.jpg)
Parallelization
Parellelize = x
![Page 29: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/29.jpg)
Transformations
Parellelize = x
t(x) = x’
t(x’) = x’’
![Page 30: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/30.jpg)
Transformationsfilter( func )union( func )intersection( set )distinct( n )map( function )
![Page 31: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/31.jpg)
Action
f(x’’) = y
Parellelize = x
t(x) = x’
t(x’) = x’’
![Page 32: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/32.jpg)
Actionscollect()count()first()take( n )reduce( function )
![Page 33: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/33.jpg)
Lineage
f(x’’) = y
Parellelize = x
t(x) = x’
t(x’) = x’’
![Page 34: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/34.jpg)
Transform
Transform ActionParalleliz
e
Lineage
![Page 35: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/35.jpg)
Transform
Transform ActionParalleliz
e Transform
Transform ActionParalleliz
e Transform
Transform ActionParalleliz
e Transform
Transform ActionParalleliz
e Transform
Transform ActionParalleliz
e
Lineage
![Page 36: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/36.jpg)
Transform
Transform ActionParalleliz
e Transform
Transform ActionParalleliz
e Transform
Transform ActionParalleliz
e Transform
Transform ActionParalleliz
e Transform
Transform ActionParalleliz
e
Lineage
![Page 37: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/37.jpg)
Transform
Transform ActionParalleliz
e Transform
Transform ActionParalleliz
e Transform
Transform ActionParalleliz
e Transform
Transform ActionParalleliz
e Transform
Transform ActionParalleliz
e
Lineage
![Page 38: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/38.jpg)
https://github.com/mongodb/mongo-hadoop
![Page 39: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/39.jpg)
{"_id" : ObjectId("4f16fc97d1e2d32371003e27"),"body" : "the scrimmage is still up in the air.
"subFolder" : "notes_inbox","mailbox" : "bass-e","filename" : "450.","headers" : {
"X-cc" : "","From" : "[email protected]","Subject" : "Re: Plays and other information","X-Folder" : "\\Eric_Bass_Dec2000\\Notes Folders\\
Notes inbox","Content-Transfer-Encoding" : "7bit","X-bcc" : "","To" : "[email protected]","X-Origin" : "Bass-E","X-FileName" : "ebass.nsf","X-From" : "Michael Simmons","Date" : "Tue, 14 Nov 2000 08:22:00 -0800 (PST)","X-To" : "Eric Bass","Message-ID" :
"<6884142.1075854677416.JavaMail.evans@thyme>","Content-Type" : "text/plain; charset=us-ascii","Mime-Version" : "1.0"
}}
![Page 40: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/40.jpg)
{"_id" : ObjectId("4f16fc97d1e2d32371003e27"),"body" : "the scrimmage is still up in the air.
"subFolder" : "notes_inbox","lfpwoojjf0wig=-i1qf=q0qif0=i38 \-00\ 1-8" : "bass-e","filename" : "450.","headers" : {
"X-cc" : "",
"From" : "[email protected]",
"Subject" : "Re: Plays and other information","X-Folder" : "\\Eric_Bass_Dec2000\\Notes Folders\\
Notes inbox","Content-Transfer-Encoding" : "7bit","X-bcc" : "",
"To" : "[email protected]","X-Origin" : "Bass-E","X-FileName" : "ebass.nsf","X-From" : "Michael Simmons","Date" : "Tue, 14 Nov 2000 08:22:00 -0800 (PST)","X-To" : "Eric Bass","Message-ID" :
"<6884142.1075854677416.JavaMail.evans@thyme>","Content-Type" : "text/plain; charset=us-ascii","Mime-Version" : "1.0"
}}
![Page 41: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/41.jpg)
{ _id : "[email protected]|[email protected]", value : 2}{ _id : "[email protected]|[email protected]", value : 2}{ _id : "[email protected]|[email protected]", value : 2 }
![Page 42: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/42.jpg)
Eratosthenes
Democritus
Hypatia
Shemp
Euripides
![Page 43: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/43.jpg)
Spark ConfigurationConfiguration conf = new Configuration();conf.set(
"mongo.job.input.format", "com.mongodb.hadoop.MongoInputFormat”);conf.set(
"mongo.input.uri", "mongodb://localhost:27017/db.collection”);
![Page 44: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/44.jpg)
Spark ContextJavaPairRDD<Object, BSONObject> documents = context.newAPIHadoopRDD( conf,
MongoInputFormat.class,Object.class,BSONObject.class
);
![Page 45: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/45.jpg)
Spark ContextJavaPairRDD<Object, BSONObject> documents = context.newAPIHadoopRDD( conf,
MongoInputFormat.class,Object.class,BSONObject.class
);
![Page 46: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/46.jpg)
Spark ContextJavaPairRDD<Object, BSONObject> documents = context.newAPIHadoopRDD( conf,
MongoInputFormat.class,Object.class,BSONObject.class
);
![Page 47: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/47.jpg)
Spark ContextJavaPairRDD<Object, BSONObject> documents = context.newAPIHadoopRDD( conf,
MongoInputFormat.class,Object.class,BSONObject.class
);
![Page 48: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/48.jpg)
Spark ContextJavaPairRDD<Object, BSONObject> documents = context.newAPIHadoopRDD( conf,
MongoInputFormat.class,Object.class,BSONObject.class
);
![Page 49: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/49.jpg)
mongos
mongos
Data Services
![Page 50: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/50.jpg)
Deployment Artifacts
Hadoop Connector Jar
Fat JarJava Driver
Jar
![Page 51: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/51.jpg)
Spark Submit/usr/local/spark-1.5.1/bin/spark-submit \ --class com.mongodb.spark.examples.DataframeExample \ --master local Examples-1.0-SNAPSHOT.jar
![Page 52: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/52.jpg)
Stand Alone
YARN
SparkMesos
SparkSQL
SparkShell
SparkStreaming
![Page 53: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/53.jpg)
JavaRDD<Message> messages = documents.map (
new Function<Tuple2<Object, BSONObject>, Message>() {
public Message call(Tuple2<Object, BSONObject> tuple) { BSONObject header = (BSONObject)tuple._2.get("headers");
Message m = new Message(); m.setTo( (String) header.get("To") ); m.setX_From( (String) header.get("From") ); m.setMessage_ID( (String) header.get( "Message-ID" ) ); m.setBody( (String) tuple._2.get( "body" ) );
return m; } });
![Page 54: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/54.jpg)
MognoDB & Spackcode demo
![Page 55: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/55.jpg)
THE FUTUREAND
BEYOND THE INFINITE
![Page 56: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/56.jpg)
Stand Alone
YARN
SparkMesos
SparkSQL
SparkShell
SparkStreaming
![Page 57: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/57.jpg)
![Page 58: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/58.jpg)
![Page 59: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/59.jpg)
![Page 60: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/60.jpg)
MongoDB + Spark
![Page 61: MongoDB & Spark](https://reader036.vdocuments.mx/reader036/viewer/2022062822/587b86531a28ab9d448b61eb/html5/thumbnails/61.jpg)
THANKS!{
name: ‘Bryan Reinero’,role: ‘Developer
Advocate’,twitter: ‘@blimpyacht’,email: