stockage des données : quel système pour quel usage ?
TRANSCRIPT
![Page 1: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/1.jpg)
#DevoxxMA @zouheircadi
STOCKAGE : QUEL SYSTEME POUR QUEL
USAGE
![Page 2: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/2.jpg)
#DevoxxMA @zouheircadi
QUI SUIS-JE
• @ZouheirCADI• JEEarchitect(bigdata,perf.,quality,ops,app,…)• Intervenantàl’ENST• Co-organisateurDevoxxFrance• (ancien…)Co-organisateurParisJavaUserGroup
![Page 3: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/3.jpg)
#DevoxxMA @zouheircadi
AGENDA
• Revu des systèmes de stockage (OLAP etOLTP)• RDBMS• OLAP(HadoopetSpark)• OLTP
• Key-Value:memcached• Document:couchdb• Columnfamily• Search
• Conclusion
![Page 4: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/4.jpg)
#DevoxxMA @zouheircadi
Why ?
• Sharedata• Manyusers
• Exposeadatamodel• Anorganizedone?
• Scalability• Dependingonusersordataprocessing
• Flexibility• Embracechange
![Page 5: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/5.jpg)
#DevoxxMA @zouheircadi
RDBMS
![Page 6: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/6.jpg)
#DevoxxMA @zouheircadi
Key date
• 80s
![Page 7: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/7.jpg)
#DevoxxMA @zouheircadi
RDBMS
RelaYonal Database Management Systemswereinventedtoletyouuseonesetofdatain mulYple ways, including ways that areunforeseenat theYmethedatabase isbuiltandthe1stapplicaYonsarewri\en.CurtMonash,analyst/blogger
![Page 8: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/8.jpg)
#DevoxxMA @zouheircadi
RDBMS
• RelaYonaldatabasesorganizedataintables• Whicharemadeofmanyrows.• Eachrowhasdata ineachofseveralcolumns(everyrowinatablehasthesamecolumns)• RelaYonshipsareimplicit
Emp
empno ename job deptno7839 King President 107698 Blake Manager 20
deptno dname loc10 Account NY20 Sales CHI
Dept
![Page 9: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/9.jpg)
#DevoxxMA @zouheircadi
RDBMS – KEY CONCEPTS
![Page 10: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/10.jpg)
#DevoxxMA @zouheircadi
1er : Physical data independence
PHYSICALFILESLOGICALMODEL
fseekfopenfread
©hWp://www.slideshare.net/billhoweuw/dataintensive-scalable-science
![Page 11: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/11.jpg)
#DevoxxMA @zouheircadi
2eme : Relational algebra
• Select,Project,Join• Union,Intersec`on,Difference
©hWp://www.slideshare.net/billhoweuw/dataintensive-scalable-science
![Page 12: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/12.jpg)
#DevoxxMA @zouheircadi
RDBMS
• Expressionlogiquedesrequêtes
SELECTe.ename,d.dnameFROMEMPeJOINDEPTdone.deptno=d.deptnoWHEREe.ename=‘King’
![Page 13: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/13.jpg)
#DevoxxMA @zouheircadi
![Page 14: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/14.jpg)
#DevoxxMA @zouheircadi
Tablescan
Tablescan
HashmatchSelect
Tablescan
Tablescan
NestedloopsSelect
SelectT1.Col2FromTable1T1InnerJoinTable2T2ONT1.Col1=T2.Col1
SelectT1.Col2FromTable1T1InnerJoinTable2T2ONT1.Col1=T2.Col1WhereT1.col1=1
©hWps://sqlcommiWed.wordpress.com/tag/hash-match-join/
![Page 15: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/15.jpg)
#DevoxxMA @zouheircadi
AtomicityTransacYonareallornothing
ConsistencyOnlyvaliddataissaved
IsolaYonTransacYondonotaffecteachother
DurabilityWri\endatawillnotbelost
Transaction
![Page 16: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/16.jpg)
#DevoxxMA @zouheircadi
Indexes
• Easytoproduce• Easytouse
![Page 17: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/17.jpg)
#DevoxxMA @zouheircadi
Scalability
• VerYcalscalability(scaleup/down)• Moreresourcestoasinglenode
![Page 18: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/18.jpg)
#DevoxxMA @zouheircadi
Scalability
• Horizontalscalability(scaleout/in)• Addmorenodestoasystem
![Page 19: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/19.jpg)
#DevoxxMA @zouheircadi
Shortcommings
• Scalability(almostnotscalable…)• SPOF• Difficulttoserveusersworldwide
![Page 20: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/20.jpg)
#DevoxxMA @zouheircadi
NoSQL
• NotOnlySQL• NothingtodowithSQL• Relaxa`on of transac`on constraints in distributedsystems• CAP
![Page 21: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/21.jpg)
#DevoxxMA @zouheircadi
CAP
• Consistency• Everyreadreceivesthemostrecentwriteoranerror
• Availability• Everyrequestreceivesaresponse ,withoutgaranteethatitcontainsthemostrecentversion
• ParYYontolerance• The system con`nue to operate despite arbitrarypar``onningduetonetworkfailure• Ifallowed,youmightsacrificeconsistency• Ifnot,youmightsacrificeavailability
• NOSQLmaysacrificeconsistencyhWps://en.wikipedia.org/wiki/CAP_theorem
![Page 22: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/22.jpg)
#DevoxxMA @zouheircadi
NoSQL
• DefaçonpluspragmaYque• Par``onning(répar``oncharge)• Replica`on(toléranceauxpannes)• Horizontalescalability
• Oncommodityhardware
• SimpleAPI• OLTP
![Page 23: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/23.jpg)
#DevoxxMA @zouheircadi
OLAP
![Page 24: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/24.jpg)
#DevoxxMA @zouheircadi
M/R
![Page 25: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/25.jpg)
#DevoxxMA @zouheircadi
Key dates
• 2003octobre:GFSpaperreleased• 2004 décembre : MapReduce Simplified Dataprocessingonlargeclusters• 2006janvier:CréaYonHadoop• 2006octobre :ClusterHadoopde600machinechezYahoo• 2007 avril: Cluster Hadoop de 1000 machinechezYahoo
hWps://en.wikipedia.org/wiki/Apache_Hadoop
![Page 26: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/26.jpg)
#DevoxxMA @zouheircadi
Map Reduce
• MR• Abstrac`on• Programmingmodel
• ImplémentaYons• Opensource
• Hadoop• Lesswellknown:Couchdb,Infinispan,Riak
• Propriétaire:Google
![Page 27: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/27.jpg)
#DevoxxMA @zouheircadi
Map Reduce
• MapReduceis• ahighlevelprogrammingmodel• andanassociatedimplementa`on• forprocessingandgenera`nglargedatasets• withaparallel,distributedalgorithmonacluster.
©hWps://en.wikipedia.org/wiki/MapReduce
![Page 28: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/28.jpg)
#DevoxxMA @zouheircadi
map()
map()
map()
<key,value>
reduce()
reduce()
![Page 29: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/29.jpg)
#DevoxxMA @zouheircadi
devoxxmorrocodevoxxfrancedevoxxpolandgreatconferencegreatconferencedevoxxtaroudant
devoxxmorrocodevoxxfrance
devoxxpolandgreat
conference
greatconferencedevoxx
taroudant
devoxx,1morroco,1devoxx,1france,1
great,1conference,1
great,1taroudant,1
devoxx,1poland,1great,1
conference,1
devoxx,1devoxx,1devoxx,1devoxx,1morroco,1france,1
poland,1great,1
conference,1great,1
conference,1taroudant,1
devoxx,4morroco,1france,1
poland,1great,2
conference,2taroudant,1
![Page 30: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/30.jpg)
#DevoxxMA @zouheircadi
Hadoop structure
• Datastorage:HDFS• Dataprocessing:MAPREDUCE
![Page 31: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/31.jpg)
#DevoxxMA @zouheircadi
©h\ps://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
![Page 32: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/32.jpg)
#DevoxxMA @zouheircadi
©h\p://stackoverflow.com/quesYons/31044575/mapreduce-2-vs-yarn-applicaYons
![Page 33: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/33.jpg)
#DevoxxMA @zouheircadi
©h\ps://www.mapr.com/blog/how-job-execuYon-framework-mapreduce-v1-v2
![Page 34: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/34.jpg)
#DevoxxMA @zouheircadi
©Hadoop,thedefiniYveguide,ThirdediYonTomWhite,O'ReillyEd.
![Page 35: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/35.jpg)
#DevoxxMA @zouheircadi
©Hadoop,thedefiniYveguide,ThirdediYonTomWhite,O'ReillyEd.
![Page 36: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/36.jpg)
#DevoxxMA @zouheircadi
Hadoop ecosystème
![Page 37: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/37.jpg)
#DevoxxMA @zouheircadi
©Hadoop,thedefiniYveguide,ThirdediYonTomWhite,O'ReillyEd.
![Page 38: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/38.jpg)
#DevoxxMA @zouheircadi
Hadoop conclusion
• Donnéesread-onlyavectraitementssimples• Map-Reduce• Movecomputa`ontodata
• Paralleliza`onanddistribu`on(Highscalability)• Faulttolerance• Statusandmonitoring• «onepersondeployment»
©hWps://en.wikipedia.org/wiki/MapReduce
![Page 39: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/39.jpg)
#DevoxxMA @zouheircadi
When ?
BIGDATA
VOLUME
VELOCITY VARIETY
![Page 40: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/40.jpg)
#DevoxxMA @zouheircadi
Software companies
![Page 41: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/41.jpg)
#DevoxxMA @zouheircadi
M/R shortcomings
• ForceyourpipelineintoMap/Reducetasks• Otherworkflows(filter,join,map-reduce-map…)
• ReadfromdiskforeveryM/Rtask• Itera`vealgorithms
• OnlynaYvejavaprogramminginterface• Supportforotherlanguages:streamingmodule• Interac`veshell
![Page 42: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/42.jpg)
#DevoxxMA @zouheircadi
Hadoop conclusion
• Grosproblèmedelenteur• MapReduceest lentmais c’est actuellement la seulealterna`vepourfairedestraitementssurHDFS
• RoadMapcontradictoiredeséditeurs• Stratégiedeséditeurs(Google)
![Page 43: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/43.jpg)
#DevoxxMA @zouheircadi
Hadoop conclusion
• Map-Reduce has served a great purpose,though: many, many companies, researchlabs and individuals are successfullybringingMap-Reduce to bear on problemstowhich it issuited:brute-forceprocessingwithanopYonalaggregaYon.
hWp://the-paper-trail.org/blog/the-elephant-was-a-trojan-horse-on-the-death-of-map-reduce-at-google/
![Page 44: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/44.jpg)
#DevoxxMA @zouheircadi
Hadoop conclusion
• Butmore important in the longer term, tomy mind, is the way that Map-Reduceprovided the jusYficaYon for re-evaluaYngthe ways in which large-scale dataprocessing plaworms are built (andpurchased!).
hWp://the-paper-trail.org/blog/the-elephant-was-a-trojan-horse-on-the-death-of-map-reduce-at-google/
![Page 45: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/45.jpg)
#DevoxxMA @zouheircadi
Hadoop conclusion
• It’s well known in the industry that morethan 10 years ago Google inventedMapReduce,thetechnologyattheheartoffirst-generaYon Hadoop. It’s less wellknown that Google moved away fromMapReduce several years ago. Today at itsGoogleI/O2014…
hWps://www.datanami.com/2014/06/25/google-re-imagines-mapreduce-launches-dataflow/
![Page 46: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/46.jpg)
#DevoxxMA @zouheircadi
Hadoop conclusion
• …Todayat itsGoogleI/O2014conference,theWebgiantunveiledapossiblesuccessorto MapReduce called Dataflow, which it’ssellingthroughitshostedcloudservice.
hWps://www.datanami.com/2014/06/25/google-re-imagines-mapreduce-launches-dataflow/
![Page 47: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/47.jpg)
#DevoxxMA @zouheircadi
Spark
![Page 48: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/48.jpg)
#DevoxxMA @zouheircadi
Key dates
• 2009AMPLabUniversityofBerk.Cal.• Originalaim:POCdeMesos• 2012:0.5.1
![Page 49: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/49.jpg)
#DevoxxMA @zouheircadi
Workernode
Executor
DriverNode
Cache
Task Task
Driverprogram
Sparkcontext
Clustermanager
WorkernodeExecutor
Cache
Task Task
![Page 50: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/50.jpg)
#DevoxxMA @zouheircadi
Spark
• ResilientDistributedDatasets(RDD)• ARDDisaresilientanddistributedcollec`onofrecords
• MoYvaYon• Itera`vealgorithmsinmachinelearning
• Supports2typesofoperaYons• Transforma`ons• Ac`ons
![Page 51: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/51.jpg)
#DevoxxMA @zouheircadi
Spark - RDD
Server1
Server2
Server3
RDD
![Page 52: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/52.jpg)
#DevoxxMA @zouheircadi
Spark
![Page 53: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/53.jpg)
#DevoxxMA @zouheircadi
Spark
• TransformaYons• Func`onsthatreturnanotherRDD• Map• FlapMap• Filter• Coalesce• GroupByKey
![Page 54: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/54.jpg)
#DevoxxMA @zouheircadi
Spark – Transformation : Map
HelloWorld
ThisIsDevoxx
Morocco
HeldIn
Casablanca
helloworld
thisisdevoxx
morocco
heldin
casablanca
.map(_toLowerCase)
![Page 55: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/55.jpg)
#DevoxxMA @zouheircadi
Spark – Transformation : flatMap
hello
wold
this
is
.flatMap(line=>line.split(«\\s+»))
helloworld
thisisdevoxx
morocco
heldin
casablanca….devoxx
![Page 56: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/56.jpg)
#DevoxxMA @zouheircadi
Spark – Transformation : map
(hello,1)
(wold,1)
(this,1)
(is,1)
.map(word=>(word,1))
….(devoxx,1)
hello
wold
this
is
….devoxx
![Page 57: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/57.jpg)
#DevoxxMA @zouheircadi
Spark – Transformation : groupByKey
(a,1)(b,1)
(a,1)(a,1)(b,1)(b,1)
(a,1)(a,1)(a,1)(b,1)(b,1)(b,1)
(a,1)(a,1)(b,1)(b,1)
(a,1)(a,1)(a,1)(b,1)(b,1)(b,1)
![Page 58: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/58.jpg)
#DevoxxMA @zouheircadi
Spark – Transformation : reduceByKey
(a,1)(b,1)
(a,1)(a,1)(b,1)(b,1)
(a,1)(a,1)(a,1)(b,1)(b,1)(b,1)
(a,1)(a,1)(a,1)(a,1)(a,1)(a,1)
(a,6)
(b,1)(b,1)(b,1)(b,1)(b,1)(b,1)
(b,6)
![Page 59: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/59.jpg)
#DevoxxMA @zouheircadi
Spark
• AcYons• func`onsthattriggercomputa`onandreturnsomethingthatisn’tanRDD• collect():copyallelementstothedriver• count()• collectAsMap()• sample()• take(n):copyfirstnelements• reduce(func):aggregateselementswithfunc(take2elements,returnone)
• saveTextAsFile(fileName):savetolocalorHDFS
![Page 60: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/60.jpg)
#DevoxxMA @zouheircadi
All in one
valsc=newSparkContext()valdocs=sc.textFile("hdfs://<path>")vallow=docs.map(line=>line.toLowerCase)valword=low.flatMap(line=>line.split("\\s+"))valcounts=words.map(word=>(word,1))valfrequency=counts.reduceByKey(_+_)valtop=frequency.map(_swap).top(N)top.forEach(println)
![Page 61: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/61.jpg)
#DevoxxMA @zouheircadi
Spark
• Caching• Bydefault,eachjobreprocessedfromHDFS• .cache()methodonRDDtriggercaching• Calledatthefirstcomputa`on(lazy)
![Page 62: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/62.jpg)
#DevoxxMA @zouheircadi
Spark
• DirectAcyclicGraphs(DAGs)• NodesareRDD• ArrowsareTransforma`ons
![Page 63: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/63.jpg)
#DevoxxMA @zouheircadi
Spark
• Batch• Streaming• IteraYve• InteracYve
![Page 64: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/64.jpg)
#DevoxxMA @zouheircadiGOOGLETRENDSSPARKvs.STORMvs.HIVE
![Page 65: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/65.jpg)
#DevoxxMA @zouheircadi
OLTP
![Page 66: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/66.jpg)
#DevoxxMA @zouheircadi
Key dates
• BigTable(Google):2004• Dynamo(Amazon):2007
![Page 67: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/67.jpg)
#DevoxxMA @zouheircadi
Data model
• Key-Value• Document• Column
![Page 68: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/68.jpg)
#DevoxxMA @zouheircadi
Key-value
• TableauassociaYf(map)• Querymodel:PUT,GET,DELETE
KEY VALUE
![Page 69: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/69.jpg)
#DevoxxMA @zouheircadi
Document {"id":"987GREHLKE878YEFB","images":["url1","url2","url3"],"prix":»1290","type":"APPARTEMENT","etage":"2","pieces":"2","chambres":"1","surface":"20","descrip`on":"desc...","ville":"PARIS","arrondissement":"75004","departement":"IDF"}
![Page 70: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/70.jpg)
#DevoxxMA @zouheircadi
Document
• Standardencodingformat:JSON,BSON,…• Querymodel• CRUD(CReate,Update,Delete)• Selectbasedondocumentcontent
![Page 71: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/71.jpg)
#DevoxxMA @zouheircadi
{"id":"987GREHLKE878YEFB","images":["url1","url2","url3"],"prix":»1290","type":"APPARTEMENT","etage":"2","pieces":"2","chambres":"1","surface":"20","descrip`on":"desc...","ville":"PARIS","arrondissement":"75004","departement":"IDF"}
![Page 72: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/72.jpg)
#DevoxxMA @zouheircadi
{Column}
• Columnfamilystores• BigTable,Hbase,Hypertable,Cassandra
• Columnstores• C-Store,Ver`ca
©hWp://dbmsmusings.blogspot.fr/2010/03/dis`nguishing-two-major-types-of_29.html
![Page 73: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/73.jpg)
#DevoxxMA @zouheircadi
Data model
©hWp://www.slideshare.net/yellow7/cassandra-backgroundandarchitecture
Rela`onalDB Databases Tables Rows Columns
MongoDB db Collec`ons Documents Fields
Elas`cSearch Indices Types Documents Fields
![Page 74: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/74.jpg)
#DevoxxMA @zouheircadi
Column family stores
• Persistent(distributed)maps
![Page 75: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/75.jpg)
#DevoxxMA @zouheircadi
Column family stores
Map<RowKey,SortedMap<ColumnKey,ColumnValue>>
©hWp://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-prac`ces-part-1/
![Page 76: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/76.jpg)
#DevoxxMA @zouheircadi
Column family stores
![Page 77: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/77.jpg)
#DevoxxMA @zouheircadi
Column family stores
Map<RowKey,SortedMap<SuperColumnKey,SortedMap<ColumnKey,ColumnValue>>>
©hWp://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-prac`ces-part-1/
![Page 78: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/78.jpg)
#DevoxxMA @zouheircadi
©h\ps://cloud.google.com/bigtable/docs/schema-design
Column family stores (bigTable)
![Page 79: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/79.jpg)
#DevoxxMA @zouheircadi
Replication model
• Master-less• Cassandra,DynamoDB,Riak,
• Masterslave• MongoDB,Redis,Hbase
• Master-Master(ouMaster-Slave)• CouchDB
©hWp://www.slideshare.net/yellow7/cassandra-backgroundandarchitecture
![Page 80: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/80.jpg)
#DevoxxMA @zouheircadi
Comparison criteria
• Datamodel• Querymodel• ReplicaYonmodel• Consistencymodel• Licensing,support,community
![Page 81: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/81.jpg)
#DevoxxMA @zouheircadi
Comparison criteria
• Datamodel• Querymodel• ReplicaYonmodel• Consistencymodel• Licensing,…
![Page 82: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/82.jpg)
#DevoxxMA @zouheircadi
System Architecture
![Page 83: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/83.jpg)
#DevoxxMA @zouheircadi
Pourquoi explosion schema less
![Page 84: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/84.jpg)
#DevoxxMA @zouheircadi
![Page 85: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/85.jpg)
![Page 86: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/86.jpg)
![Page 87: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/87.jpg)
#DevoxxMA @zouheircadi
Pourquoi explosion schema less
• Start-upvsentreprisesoldschool• (avecunTTMtrèscourt)
![Page 88: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/88.jpg)
#DevoxxMA @zouheircadi
Pourquoi explosion schema less
• Allowedbybusinessrules
![Page 89: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/89.jpg)
#DevoxxMA @zouheircadi
Pourquoi explosion schema less : 3V
![Page 90: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/90.jpg)
#DevoxxMA @zouheircadi
Contraintes à l’utilsation de NoSQL
• TransacYons• Onnepeutpasconsidérerquepasserlarésolu`ondesconflitsauclientsoitunprogrès.• Malnécessairesouventdictéparlebusiness
![Page 91: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/91.jpg)
![Page 92: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/92.jpg)
#DevoxxMA @zouheircadi
hWp://db-engines.com/en/ranking
![Page 93: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/93.jpg)
#DevoxxMA @zouheircadi
hWps://www.gartner.com/doc/reprints?id=1-2PMFPEN&ct=151013&st=sb
![Page 94: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/94.jpg)
#DevoxxMA @zouheircadi
hWps://www.google.com/trends/explore?date=2008-03-18%202016-10-18&q=RDBMS,NOSQL
![Page 95: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/95.jpg)
#DevoxxMA @zouheircadi
Why ?
• Sharedata• Manyusers
• Exposeadatamodel• Anorganizedone?
• Scalability• Dependingonusersordataprocessing
• Flexibility• Embracechange
![Page 96: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/96.jpg)
#DevoxxMA @zouheircadi
![Page 97: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/97.jpg)
#DevoxxMA @zouheircadi
URLOGRAPHIE • Hadoop,thedefiniYveguide,ThirdediYonTomWhite,ISBN:978-1-449-31152-0,O'ReillyEd.• h\ps://www.postgresql.org/about/• h\ps://blog.codeship.com/unleash-the-power-of-storing-json-in-postgres/• h\ps://opentextbc.ca/dbdesign/chapter/chapter-5-data-modelling/• h\p://coronet.iicm.edu/is/scripts/lesson03.pdf• h\ps://opentextbc.ca/dbdesign/chapter/chapter-3-characterisYcs-and-benefits-of-a-database/• h\p://gerardnico.com/wiki/relaYon/rdbms• h\ps://en.wikipedia.org/wiki/Scalability• h\p://siliconangle.com/blog/2016/06/27/google-tools-up-with-its-spanner-database-looks-for-a-fight-with-
aws/• h\p://www.ca\ell.net/datastores/Datastores.pdf• h\ps://en.wikipedia.org/wiki/Apache_Hadoop• h\ps://en.wikipedia.org/wiki/MapReduce• h\ps://www.linkedin.com/pulse/rdbms-follows-acid-property-nosql-databases-base-does
![Page 98: Stockage des données : quel système pour quel usage ?](https://reader031.vdocuments.mx/reader031/viewer/2022030318/58f20c011a28ab82368b45c3/html5/thumbnails/98.jpg)
#DevoxxMA @zouheircadi
URLOGRAPHIE • h\ps://www.quora.com/Hadoop-Why-are-companies-invesYng-so-much-into-Hadoop-if-Google-released-the-
MapReduce-paper-back-in-2004-Are-companies-just-going-to-follow-the-road-map-Google-created-Big-Table-Pregel-Dremel-etc-It-seems-to-me-that-companies-will-always-be-behind-the-curve
• h\p://the-paper-trail.org/blog/the-elephant-was-a-trojan-horse-on-the-death-of-map-reduce-at-google/• h\ps://www.mapr.com/ebooks/spark/01-what-is-apache-spark.html• h\ps://www.digitalocean.com/community/tutorials/a-comparison-of-nosql-database-management-systems-
and-models• h\ps://cloud.google.com/bigtable/docs/overview• h\ps://cloud.google.com/bigtable/docs/schema-design• h\ps://en.wikipedia.org/wiki/Dremel_(so�ware)• h\ps://www.gartner.com/doc/reprints?id=1-2PMFPEN&ct=151013&st=sb• h\p://www.infoworld.com/arYcle/3056637/database/nosql-chips-away-at-oracle-ibm-and-microso�-
dominance.html• h\p://www.slideshare.net/billhoweuw/dataintensive-scalable-science