web scale data management: an historical...

42
Web Scale Data Management: An Historical Perspec8ve Margo Seltzer March 1, 2018 CS50 for MBAs (2018) 1

Upload: others

Post on 16-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

WebScaleDataManagement:AnHistoricalPerspec8ve

MargoSeltzerMarch1,2018

CS50forMBAs(2018) 1

Page 2: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

Outline•  Inthebeginning…•  TheheydayofRDBMS•  Therebirthofkey/valuestores•  Key/Valuestorestoday:NoSQL•  NoSQL&key/valueusecases

CS50forMBAs(2018) 2

Page 3: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

IntheBeginning

•  Wherebeginningequals1960’s•  Computers

–  Centralizedsystems–  SpiffynewdatachannelsletCPUandIOoverlap.–  Persistentstorageisondrums.–  BufferingandinterrupthandlingdoneintheOS.– Makingthesesystemsfastisbecomingaresearchfocus.

•  Data– Whatdiddatalooklike?

CS50forMBAs(2018) 3

Page 4: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

OrganizingData:ISAM•  Indexedsequen8alaccessmethod

–  PioneeredbyIBMforitsmainframes–  Fixedlengthrecords–  Eachrecordlivesataspecificloca8on–  Rapidrecordaccessevenonasequen8almedium(tape)

•  Allindexesare‘secondary’– Allowkeylookup,but…– Donotcorrespondtophysicalorganiza8on–  Keyistobuildsmall,efficientindexstructures

•  FundamentalaccessmethodinCOBOL

CS50forMBAs(2018) 4

Page 5: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

OrganizingData:TheNetworkModel

•  Earlydatamanagementsystemsrepresenteddatausingsomethingcalledanetworkmodel.

•  Dataarerepresentedbycollec8onsofrecords(todaywewouldcallthosekey/datapairs).

•  Rela8onshipsamongrecordsareexpressedvialinksbetweenrecords(todaywecanthinkofthoselinksaspointers).

•  Applica8onsinteractedwithdatabynaviga8ngthroughit:–  Findarecord–  Followlinks–  Findotherrecords–  Repeat

CS50forMBAs(2018) 5

Page 6: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

TheNetworkModel:InsideRecords

•  Recordscomposedofahributes(thinkfields)•  Ahributesaresingle-valued.•  Linksconnectexactlytworecords.•  RepresentN-wayrela8onshipsvialinkrecords

CS50forMBAs(2018) 6

ID name phone

Pa8ent

date time

Appointment

name office

Doctor

Schedule

Page 7: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

TheRela8onalModel:TheCompe88on

•  TheNetworkModelhadsomeproblems–  Applica8onshadtoknowthestructureofthedata–  Changingtherepresenta8onrequiredamassiverewrite–  Fundamentally:thephysicalarrangementwas8ghtlycoupledtotheapplica8onandtheapplica8onlogic.

•  1968:TedCoddproposestherela8onalmodel–  Decouplephysicalrepresenta8onfromlogicalrepresenta8on

–  Store“records”as“tables”–  Replacelinkswithimplicitjoinsamongtables

•  Thebigques8on:coulditperform?

CS50forMBAs(2018) 7

Page 8: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

Outline•  Inthebeginning…•  TheheydayofRDBMS•  Therebirthofkey/valuestores•  Key/Valuestorestoday:NoSQL•  NoSQL&key/valueusecases

CS50forMBAs(2018) 8

Page 9: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

TheheydayofRDBMS

•  AnerCodd’spaper,muchdebateensued,buttwogroupssetouttoturnanideaintosonware.–  IBM:System/R– U.C.Berkeley:Ingres

•  Bothwereresearchprojectstoexplorethefeasibilityofimplemen8ngtherela8onalmodel.

•  Bothwerehugelysuccessfulandhadenormousimpact.

•  However,attheircore,you’llno8cesomethinginteres8ng…

CS50forMBAs(2018) 9

Page 10: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

TheDesignofSystemR

CS50forMBAs(2018) 10

ReliableStorageLayer(RSS)

DiskLayer

Read/writeblocks

SchemaandQueryLayer(RDS)

Page 11: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

TheDesignofIngres

CS50forMBAs(2018) 11

UNIXFileSystem

DiskLayer

FileSystemcatalogs

FileDatabasetables

FileLog

UNIXfiles

Command-line terminal program

EQUELprogramsEQUEL

programsEQUELprogramsEQUEL

programs

Page 12: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

TheKey/ValueStoreWithin

•  Alltheserela8onalsystemsandmostdatamanagementsystemshavehiddendeepinsidesomesortofkey/valuestorageengine.

•  Foryears,the“conven8onalwisdom”wastohidethoseKVstoresdeepunderneathaquerylanguageandschemalevel.

•  Majorexcep8onwasCOBOL,whichcon8nuedtouseISAM.

CS50forMBAs(2018) 12

Page 13: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

RDBMSmatured•  Lotsofnewfeatures(SQL-86,SQL-89,SQL-92,SQL:1999,SQL:2003,SQL:2008,SQL:2100?)–  Triggers–  Storedprocedures–  Reportgenerators–  Rules–  …

•  Incorporatednewdatamodels:–  Object-rela8onalsystems–  XML

•  Becameenormouslylarge,complexsystemsrequiringexpertadministra8on.

CS50forMBAs(2018) 13

Page 14: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

RDBMS:TheGoodandtheBad•  RDBMSAdvantages:

–  Declara8vequerylanguagethatdecoupledphysicalandlogicallayout.

–  (Rela8ve)easeofschemamodifica8on.–  Greatsupportforadhocqueries.

•  RDBMSDisadvantages:–  Payfortheoverheadofqueryprocessingevenifyoudon’thavecomplexqueries.

–  RequireDBAfortuningandmaintenance.–  NearlyalwayspayIPCtoaccessdatabaseserver.–  Requireschemadefini8on.–  Notgreatatmanaginghierarchicalorothercomplexrela8onships.

CS50forMBAs(2018) 14

Page 15: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

Outline•  Inthebeginning…•  TheheydayofRDBMS•  Therebirthofkey/valuestores•  Key/Valuestorestoday:NoSQL•  NoSQL&key/valueusecases

CS50forMBAs(2018) 15

Page 16: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

TheAgeoftheInternet

•  Newclassesofapplica8onsemerged– Search– Authen8ca8on(LDAP)– Email– Browsing–  Instantmessaging– Webservers– Onlineretail– Publickeymanagement

CS50forMBAs(2018) 16

Page 17: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

TheseApplica8onswereDifferent•  Theyweren’tsuppor8ngadhocquerymechanisms– Querysetspecifiedbytheapplica8on– Usersinteractedwiththeapplica8onsnotthedata

•  Theyhadfairlysimpleschemas•  Noneedforfancyreports•  Performancewascri8cal

Allthebellsandwhistlesofthebigrela9onalplayersweren’tdeliveringnecessaryfunc9onality,butwere

cos9ngoverhead.

CS50forMBAs(2018) 17

Page 18: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

TheEmergenceofKey/ValueStores

•  In1997,SleepycatSonware,introducedthefirstcommercialkeyvaluestore(nowOracleBerkeleyDB).

•  BerkeleyDBwas/is:–  Embedded:linksdirectlyinanapplica8on’saddressspace

–  Fast:avoidsqueryparsingandop8miza8on–  Scalable:runsfromhandheldtodatacenter–  Flexible:tradeofffunc8onalityandperformance–  Reliable:recoversfromapplica8onandsystemfailure

CS50forMBAs(2018) 18

Page 19: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

Key/ValueversusRDBMS

CS50forMBAs(2018) 19

Key-value Databases Standard RDBMS Pros • Sometimes the “good enough”

solution for simple data schemas • Much smaller footprint • Shorter execution path • More efficient, fewer OS resources • Usually no client/server • No application data mapping

• Rich SQL Support • Data typing • Data relationship management • Dynamic data relationships • Procedural languages (stored procedures) • Parallel query execution • Security, Encryption • Centralized management

Cons • Application-centric schema control • High Availability often missing (not

BDB) • Higher level abstraction APIs often

missing (not BDB) •  Limited RDBMS integration

• Much larger engine, higher overhead, client/server overhead, general purpose RDBMS solution

• Higher administrative burden • Higher TCO

Page 20: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

Fromlocaltodistributed

•  Asthewebmatured,datavolumeandvelocityandcustomerdemandgrewexponen8ally.

•  Exceededsinglesystemcapacity.•  Enteraneweraofscalabilitydemands.•  Infrastructuremigratesfromlarge-scaleSMPtoclustersofcommodityblades.

CS50forMBAs(2018) 20

Page 21: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

Outline•  Inthebeginning…•  TheheydayofRDBMS•  Therebirthofkey/valuestores•  Key/Valuestorestoday:NoSQL

–  Framing– Technology

•  NoSQL&key/valueusecases

CS50forMBAs(2018) 21

Page 22: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

EnterNoSQL

•  Webserviceproviders(e.g.,Google,Amazon,Ebay,Facebook,Yahoo)beganpushingexis8ngdatamanagementsolu8onstotheirlimits.

•  Introducedsharding:Splitthedataupacrossmul8plehoststospreadoutload.

•  Introducedreplica8on:–  Allowaccessfrommul8plesites–  Increaseavailability

•  Relaxconsistency:Didn’talwaysneedtransac8onalseman8cs

CS50forMBAs(2018) 22

Page 23: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

WhatisNoSQL?

•  Not-only-SQL(2009)•  WhileRDBMshavetypicallyfocusedontransac8onsandtheACIDproper8es,NoSQLfocusesonBASE:–  BasicAvailability:Usereplica8ontoreducethelikelihoodofdataunavailabilityandshardingtomakeanyremainingfailurespar8al

–  Sonstate:Allowdatatobeinconsistentandrelegatedesigningaroundsuchinconsistenciestoapplica8ondevelopers.

–  Eventuallyconsistent:Ensureonlythatatsomefuturepointin8methedataassumesaconsistentstate

CS50forMBAs(2018) 23

Page 24: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

WhyNoSQL?•  Threecommonproblems

1.  Unprecedentedtransac8onvolumes2.  Cri8calneedforlowlatencyaccess3.  100%Availability

•  Hardwareshin:FrommassiveSMPtoblades•  ChanginghowwedealwiththeCAPtheorem

–  TheCAPtheoremstatesthatyoucan’thaveallthreeof:•  Consistency:Allnodesseethesamedataatthesame8me.•  Availability:Everyrequestreceivesasuccess/failresponse•  Par88ontolerance:Thesystemcon8nuestooperatedespitearbitrarymessageloss.

–  RDBMs(transac8onalsystems)typicallychooseCA–  NoSQLtypicallychoosesAPorCP

CS50forMBAs(2018) 24

Page 25: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

NoSQLEvolu8on•  1997:BerkeleyDBreleasedprovidingtransac8onalkey/valuestore.•  2001:BerkeleyDBintroducesreplica8onforhighavailability.•  2006:GooglepublishesChubbyandBigTablepapers.

–  Eachsystemprovidesscalable,datamanagementforlooselycoupledsystems

•  2007:AmazonpublishesDynamopaper–  DHT-basedeventuallyconsistentkey/valuestore

•  2008+:Mul8pleOSSprojectstoproduceBigTable/Dynamoknockoffs–  HBase:ApacheHadoopprojectimplementa8onofBigTable(2008)–  CouchDB:Erlang-baseddocumentstore,MVCCandversioning(2008)–  Cassandra:Write-op8mized,column-oriented,secondaryindices

(2009)–  MongoDB:JSON-baseddocumentstore,indexing,sharded(2009)

•  2009+:Commercializa8on–  Companiesemergetosupportopensourceproducts(e.g.,DataStax/

Cassandra,Basho/Riak,Cloudera/HBase,10Gen/MongoDB)–  Companiesdevelopcommercialofferings(Oracle,Citrusleaf)

CS50forMBAs(2018) 25

Page 26: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

NoSQLTechnology

•  Someformofdistribu8onandreplica8on:– Distributedhashtables(DHTs)– Keypar88oning– Replica8oninunderlyingstorageorinNoSQL

•  Rela8onalorkey/valuestoreforunderlyingstorageengine.

•  Secondgenera8onsystemsbuildingcustomwrite-op8mizedengines.– Log-structuredmergetrees(LSM)

CS50forMBAs(2018) 26

Page 27: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

NoSQLDesignSpace

•  Localnodestoragesystem•  Distribu8on•  DataModel•  ConsistencyModel

– Eventualconsistency– Noconsistency– Transac8onalconsistency

CS50forMBAs(2018) 27

Page 28: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

LocalStorageSystems

•  Na8veKVstore– OracleNoSQLDatabase:BerkeleyDBJavaEdi8on–  Basho:Riak’sBitcask,alog-structuredhashtable– Amazon:Dynamo,BDBDataStore,BDBJE(orMySQL)

•  Log-StructuredMergeTrees–  LevelDB

•  Custom–  BigTable(andclones):TabletserversonSSTablesexploi8ngGFS-likesystems.

CS50forMBAs(2018) 28

Page 29: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

Distribu8on

•  Threemainques8ons:– Howdoyoupar88ondata– Howmanycopiesdoyoukeep– Areallcopiesequal

•  Howdoyoupar88ondata– Keypar88oning– Hashpar88oning(onencalledsharding)– Geographicpar88oning(alsocalledsharding)

CS50forMBAs(2018) 29

Page 30: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

MoreDistribu8on

•  Howmanycopiesandhow?– Usetheunderlyingfilesystem(GFS,HDFS)– Threeisgood;fiveisbeher;forhotthingssome8mesgotomany.

•  CopyEquality:– Single-Master

•  MongoDB,OracleNoSQLDatabase,

– Mul8-Master/Masterless•  Riak,CouchDB,Couchbase

CS50forMBAs(2018) 30

Page 31: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

NoSQLDataModels•  Commontomostofferings:

– Denormaliza8on:somedatavaluesareduplicatedtoprovidesuperiorqueryperformance.

– Nojoins•  Key/Value:Lihleornoschema,minimalrangequerysupport– OracleNoSQLDB,DynamoDB,Couchbase,Riak

•  BigTableColumnFamily– Hbase,Cassandra

•  Document-stores– MongoDB,CouchDB

CS50forMBAs(2018) 31

Page 32: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

DataModel:Key/Value•  Keysanddataarebothopaquebytestrings.•  Designedforpointqueries

–  Typicallynono8onofarangequery–  Ifitera8onexists,it’susuallynotinkey-order

•  Examples:–  DynamoDB–  LevelDB–  Riak–  Tokyo/KyotoCabinet–  OracleNoSQLDatabase

•  Extensions:–  OracleNoSQL:Major/Minorkeys,batching–  LevelDB:Atomicbatches

CS50forMBAs(2018) 32

Page 33: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

DataModel:ColumnFamily(1)!  Columnsaregroupedintocolumnfamilies.!  Columnfamilies:

!  Aretypicallystoredtogether!  Canhavedifferentcolumnsforeachrow!  Canhaveduplicateitemsinanycolumn

!  Noschemaortypeenforcement!  Alldataaretreatedasbytestrings

!  Indexedbyrows!  Rowsaregroupedintotablets!  Nosecondaryindexes

CS50forMBAs(2018) 33

Page 34: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

DataModel:ColumnFamily(2)

CS50forMBAs(2018) 34

Keys Name Address

1 FirstMargo

LastSeltzer

No3

StreetMillstoneLane

CityLincoln

StateMA

Zip01773

2000

No394

StreetEastRidingDr

CityCarlisle

StateMA

Zip01741

1993

PO65

CitySonyea

StateNY

Zip14556

1961

3 TitleLady

LastGaga

CityHollywood

StateCA

Zip90027

2008

4 LastMadonna

NewYork LosAngeles London 2000

ColumnFamiliesTimestamps

versions

Page 35: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

DataModel:Document

•  Key/Valuestorewherethevalueisadocumentwithstructure.

•  Documentstoreunderstandsthestructureofdocuments:–  JSON,XML,BSON,PDF,Doc

•  Some8mesdocumentscanhavesub-documents.•  Keyahributeofadocumentstoreisthatitletsyousearchwithindocumentsaswellassearchingfordocuments.

CS50forMBAs(2018) 35

Page 36: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

NoSQLConsistency•  TheCAPtheorem(EricBrewer):itisimpossibleforadistributedcomputer

systemtosimultaneouslyprovide:–  Consistency:Allnodesseethesamedataatthesame8me–  Availability:Everyrequestreceivesaresponseaboutwhetheritsucceeded–  Par88ontolerance:Thesystemcon8nuestooperatedespitearbitrary

messagelossorfailureofpartsofthesystem•  OriginallyposedasaconjecturebyBrewer,laterpreciselyprovenby

GilbertandLynch.•  It’saninteres8nghistorytoread!•  Implica8onsforNoSQL:Pick2

–  SomesystemspickCP–  OthersystemspickAP–  FewsystemspickCA

•  Why?

CS50forMBAs(2018) 36

Page 37: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

FormsofConsistency•  Consistent/Par88onableSystems

–  BigTable,Hbase,HypterTable,MongoDB,Redis,MemcacheDB

•  Available/Par88onableSystems–  Cassandra,SimpleDB,CouchDB,Riak,TokyoCabinet,Dynamo

•  EventualConsistency–  Thereexistmul8pledefini8ons.ThemostpopularoneisduetoVogels:Thestoragesystem

guaranteesthatifnonewupdatesaremadetotheobject,eventually(anertheinconsistencywindowcloses)allaccesseswillreturnthelastupdatedvalue.

–  YoucanbeAPandeventuallyconsistent.

•  Variable:throughcarefulconfigura8onchoosehowconsistentyouwanttobe:–  Cassandra:read/writetoone,quorum,all–  OracleNoSQLDB:CanbeCPwhensewngackpolicyforsimplemajority,elseisAP.–  AmazonDynamo:Pickingread/writequorumstradesoffperformanceanddegreeof

inconsistency

CS50forMBAs(2018) 37

Page 38: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

Outline

•  Inthebeginning…•  TheheydayofRDBMS•  Therebirthofkey/valuestores•  Key/Valuestorestoday:NoSQL•  NoSQL&key/valueusecases

CS50forMBAs(2018) 38

Page 39: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

CassandraUseCase:NexlixAdrianCockcronhhp://www.hpts.ws/sessions/GlobalNexlixHPTS.pdf

•  In2011Nexlixmovedfromcentralizeddatacenterstoanout-sourced,distributedcloud-basedsystem.

•  ReplacedcentralizeddatabasewithCassandrafor:–  Customers– Movies–  History–  Configura8on

•  Why?–  Noneedforschemachanges–  Highperformance–  Scalability

CS50forMBAs(2018) 39

k(%3$!g30!b%20!G2:23#'(!

/QE!Q(%;N!

b%20!G2:23#'(!

O1P#%J'(N!F'(J1#'!

/QE!

7;<&=8>&

FY!

4%C>%3'3$!F'(J1#'P!

C'C#2#+'0!

7?@&

42PP230(2!g4)!

E3$'(32:!O1P5P!

G2#5<>!

F1C>:'OG!

Page 40: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

HBaseUseCase:FacebookMessagesKannanMuthukkaruppan:

hhp://www.hpts.ws/sessions/StorageInfraBehindMessages.pdf

•  Facebookintegratedmessaging,chat,emailandSMSunderasinglenewmessageframework.

•  Why?– Highwritethroughput– Goodreadperformance– Horizontalscalability–  Strongconsistency

•  WhatisinHbase?–  Smallmessages– Messagemetadata–  Searchindex

CS50forMBAs(2018) 40

Page 41: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

MongoDBUseCase:ViperMediahhp://nosql.mypopescu.com/post/16058009985/mongodb-at-viber-media-the-plaxorm-enabling-free

•  ViperusesMongoDBinacentralservicetowhichmobileclientsconnecttoroutemessagestootherclients.

•  Why?– Scalability– Redundancy

•  WhatisinMongoDB?– Variablelengthdocuments– Dic8onaryindexes

CS50forMBAs(2018) 41

Page 42: Web Scale Data Management: An Historical Perspec8vecdn.cs50.net/hbs/2018/q3/classes/web-scale_data_management/we… · • Key/Value stores today: NoSQL • NoSQL & key/value use

WrappingUp•  WhenshouldyoubeconsideringNoSQL?

–  Scalability–  Lowlatency–  Redundancy–  Noadhocqueries–  Joinseasilyimplementableintheapplica8on–  Straightforwardkeystructure

•  AcoupleofsitesIfoundprehyinteres8ng:–  ListofNoSQLDatabases:hhp://nosql-database.org–  NoSQLDataModeling:

hhp://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/

–  RecentPerformanceComparison:hhp://www.networkworld.com/cgi-bin/mailto/x.cgi?pagetosend=/news/tech/2012/102212-nosql-263595.html

CS50forMBAs(2018) 42