nosql and column store2.amazon’s dynamo 3.google’s bigtable duke cs, fall 2017 compsci 516:...

68
CompSci 516 Database Systems Lecture 19 NoSQL and Column Store Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1

Upload: others

Post on 06-Apr-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

CompSci 516DatabaseSystems

Lecture19NoSQLand

ColumnStore

Instructor:Sudeepa Roy

DukeCS,Fall2017 CompSci516:DatabaseSystems 1

Page 2: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

Announcements

• HW3releasedonSakai– DueonMonday,Nov20,11:55pm(in2weeks)– Startsoon,finishsoon!– Youcanlearnaboutconceptualquestionsfromonlinematerial,butmustwriteyourownanswer

• Keepworkingonyourprojecttoo!

DukeCS,Fall2017 CompSci516:DatabaseSystems 2

Page 3: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

ReadingMaterialNOSQL:• “ScalableSQLandNoSQLDataStores”RickCattell,SIGMODRecord,December2010(Vol.39,No.4)• seewebpagehttp://cattell.net/datastores/ forupdatesandmorepointers• MongoDBmanual:https://docs.mongodb.com/manual/

ColumnStore:• D.Abadi,P.Boncz,S.Harizopoulos,S.Idreos andS.Madden.TheDesignandImplementationof

ModernColumn-OrientedDatabaseSystems.FoundationsandTrendsinDatabases,vol.5,no.3,pp.197–280,2012.

• SeeVLDB2009tutorial:http://nms.csail.mit.edu/~stavros/pubs/tutorial2009-column_stores.pdf

Optional:• “Dynamo:Amazon’sHighlyAvailableKey-valueStore”ByGiuseppeDeCandia et.al.SOSP

2007

• “Bigtable:ADistributedStorageSystemforStructuredData”FayChanget.al.OSDI2006

DukeCS,Fall2017 CompSci516:DatabaseSystems 3

Page 4: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

NoSQL

DukeCS,Fall2017 CompSci516:DatabaseSystems 4

Page 5: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

DukeCS,Fall2017 CompSci516:DatabaseSystems 5

Page 6: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

Sofar-- RDBMS

• RelationalDataModel• RelationalDatabaseSystems(RDBMS)• RDBMSshave– acompletepre-definedfixedschema– aSQLinterface– andACIDtransactions

DukeCS,Fall2017 CompSci516:DatabaseSystems 6

Page 7: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

Today• NoSQL:”new”databasesystems– nottypicallyRDBMS– relaxonsomerequirements,gainefficiencyandscalability

• Newsystemschoosetouse/notuseseveralconceptswelearntsofar– e.g.“System---”doesnotuselocksbutusesmulti-versionCC(MVCC)or,

– “System---”usesasynchronousreplication• therefore,itisimportanttounderstandthebasics(Lectures1-18)eveniftheyarenotusedinsomenewsystems!

DukeCS,Fall2017 CompSci516:DatabaseSystems 7

Page 8: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

Warnings!

• MaterialfromCattell’spaper(2010-11)–someinfowillbeoutdated– seewebpagehttp://cattell.net/datastores/ forupdatesandmorepointers

• WewillfocusonthebasicideasofNoSQLsystems

• Optional readingslidesattheendonMongoDB– maybeusefulforHW3– therearealsocomparisontablesintheCattell’spaperifyouareinterested

DukeCS,Fall2017 CompSci516:DatabaseSystems 8

Page 9: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

OLAPvs.OLTP

• OLTP(OnLine TransactionProcessing)– Recalltransactions!– Multipleconcurrentread-writerequests– Commercialapplications(banking,onlineshopping)– Datachangesfrequently– ACIDproperties,concurrencycontrol,recovery

• OLAP(OnLine AnalyticalProcessing)– Manyaggregate/group-byqueries– multidimensionaldata– Datamostlystatic– WillstudyOLAPCubesoon

DukeCS,Fall2017 CompSci516:DatabaseSystems 9

Page 10: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

NewSystems• WewillexamineanumberofSQLandso- called“NoSQL”systemsor“datastores”

• DesignedtoscalesimpleOLTP-styleapplicationloads– todoupdatesaswellasreads– incontrasttotraditionalDBMSsanddatawarehouses– toprovidegoodhorizontalscalability(?)forsimpleread/writedatabaseoperationsdistributedovermanyservers

• OriginallymotivatedbyWeb2.0applications– thesesystemsaredesignedtoscaletothousandsormillionsofusers

DukeCS,Fall2017 CompSci516:DatabaseSystems 10

Page 11: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

NewSystemsvs.RDMS• Whenyoustudyanewsystem,compareitwithRDBMS-sonits– datamodel– consistencymechanisms– storagemechanisms– durabilityguarantees– availability– querysupport

• Thesesystemstypicallysacrificesomeofthesedimensions– e.g.database-widetransactionconsistency,inordertoachieveothers,e.g.higheravailabilityandscalability

DukeCS,Fall2017 CompSci516:DatabaseSystems 11

Page 12: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

NoSQL

• Manyofthenewsystemsarereferredtoas“NoSQL”datastores

• NoSQLstandsfor“NotOnlySQL”or“NotRelational”– notentirelyagreedupon

• Next:sixkeyfeaturesofNoSQLsystems

DukeCS,Fall2017 CompSci516:DatabaseSystems 12

Page 13: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

NoSQL:SixKeyFeatures

1. theabilitytohorizontallyscale“simpleoperations”throughputovermanyservers

2. theabilitytoreplicateandtodistribute(partition)dataovermanyservers

3. asimplecalllevelinterfaceorprotocol(incontrasttoSQLbinding)

4. aweakerconcurrencymodelthantheACIDtransactionsofmostrelational(SQL)databasesystems

5. efficientuseofdistributedindexesandRAMfordatastorage6. theabilitytodynamicallyaddnewattributestodatarecords

DukeCS,Fall2017 CompSci516:DatabaseSystems 13

Page 14: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

ImportantExamplesofNewSystems

• Threesystemsprovideda“proofofconcept”andinspiredmanyotherdatastores

1. Memcached2. Amazon’sDynamo3. Google’sBigTable

DukeCS,Fall2017 CompSci516:DatabaseSystems 14

Page 15: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

1.Memcached:mainfeatures

• popularopensourcecache

• supportsdistributedhashing(later)

• demonstratedthatin-memoryindexes canbehighlyscalable,distributing andreplicatingobjectsovermultiplenodes

DukeCS,Fall2017 CompSci516:DatabaseSystems 15

Page 16: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

2.Dynamo:mainfeatures

• pioneeredtheideaofeventualconsistencyasawaytoachievehigheravailabilityandscalability

• datafetchedarenotguaranteedtobeup-to-date

• butupdatesareguaranteedtobepropagatedtoallnodeseventually

DukeCS,Fall2017 CompSci516:DatabaseSystems 16

Page 17: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

3.BigTable :mainfeatures

• demonstratedthatpersistentrecordstoragecouldbescaledtothousandsofnodes

• “columnfamilies”

• https://cloud.google.com/bigtable/• https://static.googleusercontent.com/media/research.google.co

m/en//archive/bigtable-osdi06.pdf

DukeCS,Fall2017 CompSci516:DatabaseSystems 17

Page 18: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

BASE(notACIDJ)

• RecallACIDforRDBMSdesiredpropertiesoftransactions:– Atomicity,Consistency,Isolation,andDurability

• NOSQLsystemstypicallydonotprovideACID

• BasicallyAvailable• Softstate• Eventuallyconsistent

DukeCS,Fall2017 CompSci516:DatabaseSystems 18

Page 19: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

ACIDvs.BASE• TheideaisthatbygivingupACIDconstraints,onecanachievemuchhigherperformanceandscalability

• Thesystemsdifferinhowmuchtheygiveup– e.g.mostofthesystemscallthemselves“eventuallyconsistent”,meaningthatupdatesareeventuallypropagatedtoallnodes

– butmanyofthemprovidemechanismsforsomedegreeofconsistency,suchasmulti-versionconcurrencycontrol(MVCC)

DukeCS,Fall2017 CompSci516:DatabaseSystems 19

Page 20: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

“CAP”Theorem

• OftenEricBrewer’sCAPtheoremcitedforNoSQL

• A systemcanhaveonlytwooutofthreeofthefollowingproperties:• Consistency– doallclientsseethesamedata?

• Availability– isthesystemalwayson?

• Partition-tolerance– evenifcommunicationisunreliable,doesthesystemfunction?

• TheNoSQLsystemsgenerallygiveupconsistency– However,thetrade-offsarecomplex

DukeCS,Fall2017 CompSci516:DatabaseSystems 20

Page 21: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

TwofociforNoSQLsystems

1. “Simple”operations

2. HorizontalScalability

DukeCS,Fall2017 CompSci516:DatabaseSystems 21

Page 22: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

1.“Simple”Operations

• Readingorwritingasmallnumberofrelatedrecordsineachoperation– e.g.keylookups– readsandwritesofonerecordorasmallnumberofrecords

• Thisisincontrasttocomplexqueries,joins,orread-mostlyaccess

• Inspiredbyweb,wheremillionsofusersmaybothreadandwritedatainsimpleoperations– e.g.searchandupdatemulti-serverdatabasesofelectronic

mail,personalprofiles,webpostings,wikis,customerrecords,onlinedatingrecords,classifiedads,andmanyotherkindsofdata

DukeCS,Fall2017 CompSci516:DatabaseSystems 22

Page 23: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

2.HorizontalScalability

• Shared-NothingHorizontalScaling

• Theabilitytodistributeboththedataandtheloadofthesesimpleoperationsovermanyservers– withnoRAMordisksharedamongtheservers

• Not“vertical”scaling– whereadatabasesystemutilizesmanycoresand/orCPUsthatshareRAManddisks

• Someofthesystemswedescribeprovidebothverticalandhorizontalscalability

DukeCS,Fall2017 CompSci516:DatabaseSystems 23

Page 24: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

2.Horizontalvs.VerticalScaling

• Effectiveuseofmultiplecores(verticalscaling)isimportant– butthenumberofcoresthatcansharememoryislimited

• horizontalscalinggenerallyislessexpensive– canusecommodityservers

• Note:horizontalandverticalpartitioningarenotrelatedtohorizontalandverticalscaling (Lecture18)– exceptthattheyarebothusefulforhorizontalscaling

DukeCS,Fall2017 CompSci516:DatabaseSystems 24

Page 25: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

WhatisdifferentinNOSQLsystems

• WhenyoustudyanewNOSQLsystem,noticehowitdiffersfromRDBMSintermsof

1. ConcurrencyControl2. DataStorageMedium3. Replication4. Transactions

DukeCS,Fall2017 CompSci516:DatabaseSystems 25

Page 26: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

ChoicesinNOSQLsystems:1.ConcurrencyControl

a) Locks– somesystemsprovideone-user-at-a-timereadorupdatelocks– MongoDBprovideslockingatafieldlevel

b) MVCCc) None– donotprovideatomicity– multipleuserscaneditinparallel– noguaranteewhichversionyouwillread

d) ACID– pre-analyzetransactionstoavoidconflicts– nodeadlocksandnowaitsonlocks

DukeCS,Fall2017 CompSci516:DatabaseSystems 26

Page 27: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

ChoicesinNOSQLsystems:2.DataStorageMedium

a) StorageinRAM– snapshotsorreplicationtodisk– poorperformancewhenoverflowsRAM

b) Diskstorage– cachinginRAM

DukeCS,Fall2017 CompSci516:DatabaseSystems 27

Page 28: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

ChoicesinNOSQLsystems:3.Replication

• whethermirrorcopiesarealwaysinsynca) Synchronousb) Asynchronous– faster,butupdatesmaybelostinacrash

c) Both– localcopiessynchronously,remotecopies

asynchronously

DukeCS,Fall2017 CompSci516:DatabaseSystems 28

Page 29: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

ChoicesinNOSQLsystems:4.TransactionMechanisms

a) supportb) donotsupportc) inbetween– supportlocaltransactionsonlywithinasingle

objector“shard”– shard=ahorizontalpartitionofdataina

database

DukeCS,Fall2017 CompSci516:DatabaseSystems 29

Page 30: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

ComparisonfromCattell’spaper(2011)

DukeCS,Fall2017 CompSci516:DatabaseSystems 30

Page 31: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

DataModelTerminologyforNoSQL

• UnlikeSQL/RDBMS,theterminologyforNoSQLisofteninconsistent– wearefollowingnotationsinCattell’spaper

• Allsystemsprovideawaytostorescalarvalues– e.g.numbersandstrings

• Someofthemalsoprovideawaytostoremorecomplexnestedorreferencevalues

DukeCS,Fall2017 CompSci516:DatabaseSystems 31

Page 32: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

DataModelTerminologyforNoSQL

• Thesystemsallstoresetsofattribute-valuepairs– butusefourdifferentdatastructures

1. Tuple2. Document3. ExtensibleRecord4. Object

DukeCS,Fall2017 CompSci516:DatabaseSystems 32

Page 33: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

1.Tuple

• Sameasbefore• A“tuple”isarowinarelationaltable– attributenamesarepre-definedinaschema– thevaluesmustbescalar– thevaluesarereferencedbyattributename– incontrasttoanarrayorlist,wheretheyarereferencedbyordinalposition

DukeCS,Fall2017 CompSci516:DatabaseSystems 33

Page 34: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

2.Document

• Allowsvaluestobenesteddocumentsorlistsaswellasscalarvalues– thinkaboutXMLorJSON

• Theattributenamesaredynamicallydefinedforeachdocumentatruntime

• Adocumentdiffersfromatupleinthattheattributesarenotdefinedinaglobalschema– anda widerrangeofvaluesarepermitted

DukeCS,Fall2017 CompSci516:DatabaseSystems 34

Page 35: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

3.ExtensibleRecord

• A hybrid betweenatupleandadocument• familiesofattributesaredefinedinaschema• butnewattributescanbeadded(withinanattributefamily)onaper-recordbasis

• Attributesmaybelist-valued

DukeCS,Fall2017 CompSci516:DatabaseSystems 35

Page 36: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

4.Object

• Analogoustoanobjectinprogramminglanguages– butwithouttheproceduralmethods

• Valuesmaybereferencesornestedobjects

DukeCS,Fall2017 CompSci516:DatabaseSystems 36

Page 37: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

DataStoreCategories• Thedatastoresaregroupedaccordingtotheirdatamodel• Key-valueStores:

– storevaluesandanindextofindthem– basedonaprogrammer- definedkey

• DocumentStores:– storedocuments– Thedocumentsareindexedandasimplequerymechanismis

provided• ExtensibleRecordStores:

– storeextensiblerecordsthatcanbepartitionedverticallyandhorizontallyacrossnodes

– Somepaperscallthese“widecolumnstores”• RelationalDatabases:

– store(andindexandquery)tuples– e.g.thenewRDBMSsthatprovidehorizontalscaling

DukeCS,Fall2017 CompSci516:DatabaseSystems 37

Page 38: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

ExampleNOSQLsystems

• Key-valueStores:– ProjectVoldemort,Riak,Redis,Scalaris,TokyoCabinet,Memcached/Membrain/Membase

• DocumentStores:– AmazonSimpleDB,CouchDB,MongoDB,Terrastore

• ExtensibleRecordStores:– Hbase,HyperTable,Cassandra,Yahoo’sPNUTS

• RelationalDatabases:– MySQLCluster,VoltDB,Clustrix,ScaleDB,ScaleBase,NimbusDB,GoogleMegastore(alayeronBigTable)

DukeCS,Fall2017 CompSci516:DatabaseSystems 38

Page 39: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

UseCase:Key-valuestore

• ifyouhaveasimpleapplicationwithonlyonekindofobject,andyou onlyneedtolookupobjectsupbasedononeattribute

• Supposeyouhaveawebapplication– thatdoesmanyRDBMSqueriestocreateatailoredpagewhenauserlogsin

– Supposeittakesseveralsecondstoexecutethosequeries,andtheuser’sdataisrarelychanged

– youmightwanttostoretheuser’stailoredpageasasingleobjectinakey-valuestore

DukeCS,Fall2017 CompSci516:DatabaseSystems 39

Page 40: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

Usecase:DocumentStore

• applicationwithmultipledifferentkindsofobjects– e.g.inaDepartmentofMotorVehiclesapplication,withvehiclesanddrivers

• whereyouneedtolookupobjectsbasedonmultiplefields– e.g.,adriver’sname,licensenumber,ownedvehicle,orbirthdate

DukeCS,Fall2017 CompSci516:DatabaseSystems 40

Page 41: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

Usecase:ExtensibleRecordStore• usescasessimilartothosefordocumentstores:

– multiplekindsofobjects,withlookupsbasedonanyfield.

• However,aimedathigherthroughput,andmayprovidestrongerconcurrencyguarantees,– atthecostofslightlymorecomplexitythanthedocumentstores

• SupposestoringcustomerinformationforaneBay-styleapplication,andyouwanttopartitionyourdatabothhorizontallyandvertically:– clustercustomersbycountry,sothatyoucanefficientlysearchallof

thecustomersinonecountry– separatetherarely-changed“core”customerinformationsuchas

customeraddressesandemailaddressesinoneplace,and– putcertainfrequently-updatedcustomerinformation(suchascurrent

bidsinprogress)inadifferentplace,toimproveperformanceDukeCS,Fall2017 CompSci516:DatabaseSystems 41

Page 42: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

Usecase:ScalableRDBMS

• Ifyourapplicationrequiresmanytableswithdifferenttypesofdata– arelationalschemacentralizesandsimplifiesdatadefinitionandSQLsimplifiesoperations

– orforprojectswithmanyprogrammers• However,moreusefuliftheapplicationdoesnotrequire– updatesorjoinsthatspanmanynodes– transactioncoordination– or,datamovement

DukeCS,Fall2017 CompSci516:DatabaseSystems 42

Page 43: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

ConsistentHashing

DukeCS,Fall2017 CompSci516:DatabaseSystems 43

inDynamoDB

Page 44: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

ConsistentHashing(CH)• Recalldynamichashingschemes• Ifthe#ofslots(directorysize)changes,thenalmostallkeyshadtoberemapped

• Inconsistenthashing(CH),with#keys=Kand#slots=N,onlyK/Nkeysneedtoberemappedonaverage

• AppliestothedesignofDistributedHashTable(DHTs)forUniformLoadDistribution– partitionakeyspace amongasetofsites/nodes– additionallyprovideanoverlaynetworkthatconnectsnodessuchthatthenodesresponsibleforanykeycanbeefficientlylocated

DukeCS,Fall2017 CompSci516:DatabaseSystems 44

Page 45: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

DynamoDB :CH1/2• [ref.theDynamoDB paper,sec4.3]• Mustscaleincrementally• Consistenthashingisusedtodynamicallydistribute

dataarounda“ring”ofnodes(=sites)• Theoutputofahashfunctionistreatedasacircular

ring• Eachnodeisassignedarandomvalueinthisspace

– representsthe“position”onthering

DukeCS,Fall2017 CompSci516:DatabaseSystems 45

• Dataitemidentifiedbyakey• Assigntoanodebyhashing

thekeyto

Page 46: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

DynamoDB :CH2/2• Dataitemidentifiedbyakey• Assigntoanodebyhashingthekeytoyielditsposition

onthering• Walktheringclockwisetofindthefirstnodewitha

positionlargerthantheitem’sposition• Eachnodeisresponsiblefortheregioninthering

betweenitanditspredecessornodeonthering

DukeCS,Fall2017 CompSci516:DatabaseSystems 46

• Note:• departureorarrivalofanodeonly

affectsitsimmediateneighbor• Theothernodesremainunaffected• K/Nonaverage!

Page 47: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

DynamoDB:Replication• Dynamoreplicatesitsdataonmultiple(N)hostsforhigh

availabilityanddurability• Eachkeykisassignedtoacoordinatorwhichisinchargeof

replication– coordinatorhandlesallkeysinitsrange– Coordinatorreplicateseachkeyitisinchargeof

• bystoringitlocally• replicatingitattheN-1clockwisesuccesor nodesinthering

• EachnodeisinchargeofregionoftheringbetweenitanditsN-th predecessor

DukeCS,Fall2017 CompSci516:DatabaseSystems 47

NodeBreplicateskeyKatnodesCandDNodeDwillstorekeysintherange(A,B],(B,C],(C,D]Note:theremaybe<N“physical”nodes,uses“virtualnodes”

Page 48: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

CHHistory• ProposedbyCStheoreticiansfromMIT:

– Karger-Lehman-Leighton-Panigrahy-Levine-Lewin– “ConsistentHashingandRandomTrees:DistributedCachingProtocols

forRelievingHotSpotsontheWorldWideWeb”– STOC1997

• ConsistenthashinggavebirthtoAkamaiTechnologies– FoundedbyDannyLewinandTomLeightonin1998– Akamai’scontentdeliverynetworkisoneofthelargestdistributed

computingplatforms– Nowmarketcap$12Band6200employees– Managingweb-presenceofmanymajorcompanies

• 2001:TheconceptofDistributedHashTable(DHT)isproposed(howtolookforafile)andCHwasre-purposed

• NowusedinDynamo,Couchbase,Cassandra,Voldemort,Riak,..

DukeCS,Fall2017 CompSci516:DatabaseSystems 48

Page 49: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

SQLvs.NOSQL

DukeCS,Fall2017 CompSci516:DatabaseSystems 49

Argumentsforbothsidesstillacontroversialtopic

Page 50: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

WhychooseRDBMSoverNoSQL:1/31. Ifnewrelationalsystemscandoeverything

thataNoSQLsystemcan,withanalogousperformanceandscalability(?),andwiththeconvenienceoftransactionsandSQL,NoSQLisnotneeded

2. RelationalDBMSshavetakenandretainedmajoritymarketshareoverothercompetitorsinthepast30years– (network,object,andXMLDBMSs)

DukeCS,Fall2017 CompSci516:DatabaseSystems 50

Page 51: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

WhychooseRDBMSoverNoSQL:2/33. SuccessfulrelationalDBMSshavebeenbuilt

tohandleotherspecificapplicationloads inthepast:– read-onlyorread-mostlydatawarehousing– OLTPonmulti-coremulti-diskCPUs– in-memorydatabases– distributeddatabases,and– nowhorizontallyscaleddatabases

DukeCS,Fall2017 CompSci516:DatabaseSystems 51

Page 52: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

WhychooseRDBMSoverNoSQL:3/3

4. Whileno“onesizefitsall”intheSQLproductsthemselves,thereisacommoninterfacewithSQL,transactions,andrelationalschemathatgiveadvantagesintraining,continuity,anddatainterchange

DukeCS,Fall2017 CompSci516:DatabaseSystems 52

Page 53: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

WhychooseNoSQLoverRDBMS:1/31. Wehaven’tyetseengoodbenchmarksshowing

thatRDBMSscanachievescaling comparablewithNoSQLsystemslikeGoogle’sBigTable

2. Ifyouonlyrequirealookupofobjectsbasedonasinglekey– thenakey-valuestoreisadequateandprobablyeasiertounderstand

thanarelationalDBMS– Likewiseforadocumentstoreonasimpleapplication:youonlypay

thelearningcurveforthelevelofcomplexityyourequire

DukeCS,Fall2017 CompSci516:DatabaseSystems 53

Page 54: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

WhychooseNoSQLoverRDBMS:2/3

3. Someapplicationsrequireaflexibleschema– allowingeachobjectinacollectiontohavedifferentattributes

– WhilesomeRDBMSsallowefficient“packing”oftupleswithmissingattributes,andsomeallowaddingnewattributesatruntime,thisisuncommon

DukeCS,Fall2017 CompSci516:DatabaseSystems 54

Page 55: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

WhychooseNoSQLoverRDBMS:3/3

4. ArelationalDBMSmakes“expensive”(multi- nodemulti-table)operations“tooeasy”– NoSQLsystemsmakethemimpossibleorobviouslyexpensiveforprogrammers

5. WhileRDBMSshavemaintainedmajoritymarketshareovertheyears,otherproductshaveestablishedsmallerbutnon-trivialmarketsinareaswherethereisaneedforparticularcapabilities– e.g.indexedobjectswithproductslikeBerkeleyDB,orgraph-following

operationswithobject-orientedDBMSs

DukeCS,Fall2017 CompSci516:DatabaseSystems 55

Page 56: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

ColumnStore

DukeCS,Fall2017 CompSci516:DatabaseSystems 56

Page 57: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

Rowvs.ColumnStore

• Rowstore– storeallattributesofatupletogether– storagelike“row-majororder”inamatrix

• Columnstore– storeallrowsforanattribute(column)together– storagelike“column-majororder”inamatrix

• e.g.– MonetDB,Vertica(earlier,C-store),SAP/SybaseIQ,GoogleBigtable (withcolumngroups)

DukeCS,Fall2017 CompSci516:DatabaseSystems 57

Page 58: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

DukeCS,Fall2017 CompSci516:DatabaseSystems 58

Ack:SlidefromVLDB2009tutorialonColumnstore

Page 59: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

DukeCS,Fall2017 CompSci516:DatabaseSystems 59

Ack:SlidefromVLDB2009tutorialonColumnstore

Page 60: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

DukeCS,Fall2017 CompSci516:DatabaseSystems 60

Ack:SlidefromVLDB2009tutorialonColumnstore

Page 61: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

DukeCS,Fall2017 CompSci516:DatabaseSystems 61

Ack:SlidefromVLDB2009tutorialonColumnstore

Page 62: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

DukeCS,Fall2017 CompSci516:DatabaseSystems 62

Ack:SlidefromVLDB2009tutorialonColumnstore

Page 63: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

DukeCS,Fall2017 CompSci516:DatabaseSystems 63

AdditionalandOptionalSlidesonMongoDB

(MaybeusefulforHW3)https://docs.mongodb.comhttps://docs.mongodb.com/manual/reference/sql-comparison/

Page 64: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

MongoDB

• MongoDBisanopensourcedocumentstorewritteninC++• providesindexesoncollections• lockless• providesadocumentquerymechanism• supportsautomaticsharding• Replicationismostlyusedforfailover• doesnotprovidetheglobalconsistencyofatraditionalDBMS

– butyoucangetlocalconsistencyontheup-to-dateprimarycopyofadocument

• supportsdynamicquerieswithautomaticuseofindices,likeRDBMSs

• alsosupportsmap-reduce– helpscomplexaggregationsacrossdocs

• providesatomicoperationsonfields

DukeCS,Fall2017 CompSci516:DatabaseSystems 64

Optionalslide:Readyourself

Page 65: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

MongoDB:AtomicOpsonFields• Theupdatecommandsupports“modifiers”thatfacilitateatomic

changestoindividualvalues– $setsetsavalue– $inc incrementsavalue– $pushappendsavaluetoanarray– $pushAll appendsseveralvaluestoanarray– $pullremovesavaluefromanarray,and$pullAll removesseveral

valuesfromanarray• Sincetheseupdatesnormallyoccur“inplace”,theyavoidthe

overheadofareturntriptotheserver• Thereisan“updateifcurrent”conventionforchangingadocument

onlyiffieldvaluesmatchagivenpreviousvalue• MongoDBsupportsafindAndModify commandtoperforman

atomicupdateandimmediatelyreturntheupdateddocument– usefulforimplementingqueuesandotherdatastructuresrequiring

atomicity

DukeCS,Fall2017 CompSci516:DatabaseSystems 65

Optionalslide:Readyourself

Page 66: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

MongoDB:Index• MongoDBindicesareexplicitlydefinedusinganensureIndex call– anyexistingindicesareautomaticallyusedforqueryprocessing

• Tofindallproductsreleasedlastyear(2015)orlatercostingunder$100youcouldwrite:

• db.products.find({released:{$gte:newDate(2015,1,1,)},price{‘$lte’:100},})

DukeCS,Fall2017 CompSci516:DatabaseSystems 66

Optionalslide:Readyourself

Page 67: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

MongoDB:Data

• MongoDBstoresdatainabinaryJSON-likeformatcalledBSON– BSONsupportsboolean,integer,float,date,stringandbinarytypes

–MongoDBcanalsosupportlargebinaryobjects,eg.imagesandvideos

– Thesearestoredinchunksthatcanbestreamedbacktotheclientforefficientdelivery

DukeCS,Fall2017 CompSci516:DatabaseSystems 67

Optionalslide:Readyourself

Page 68: NoSQL and Column Store2.Amazon’s Dynamo 3.Google’s BigTable Duke CS, Fall 2017 CompSci 516: Database Systems 14. 1. Memcached: main features •popular open source cache •supports

MongoDB:Replication

• MongoDBsupportsmaster-slavereplicationwithautomaticfailoverandrecovery– Replication(andrecovery)isdoneatthelevelofshards

– Replicationisasynchronousforhigherperformance,sosomeupdatesmaybelostonacrash

DukeCS,Fall2017 CompSci516:DatabaseSystems 68

Optionalslide:Readyourself