web scale data management: an historical...
Post on 16-Oct-2020
6 Views
Preview:
TRANSCRIPT
WebScaleDataManagement:AnHistoricalPerspec8ve
MargoSeltzerMarch1,2018
CS50forMBAs(2018) 1
Outline• Inthebeginning…• TheheydayofRDBMS• Therebirthofkey/valuestores• Key/Valuestorestoday:NoSQL• NoSQL&key/valueusecases
CS50forMBAs(2018) 2
IntheBeginning
• Wherebeginningequals1960’s• Computers
– Centralizedsystems– SpiffynewdatachannelsletCPUandIOoverlap.– Persistentstorageisondrums.– BufferingandinterrupthandlingdoneintheOS.– Makingthesesystemsfastisbecomingaresearchfocus.
• Data– Whatdiddatalooklike?
CS50forMBAs(2018) 3
OrganizingData:ISAM• Indexedsequen8alaccessmethod
– PioneeredbyIBMforitsmainframes– Fixedlengthrecords– Eachrecordlivesataspecificloca8on– Rapidrecordaccessevenonasequen8almedium(tape)
• Allindexesare‘secondary’– Allowkeylookup,but…– Donotcorrespondtophysicalorganiza8on– Keyistobuildsmall,efficientindexstructures
• FundamentalaccessmethodinCOBOL
CS50forMBAs(2018) 4
OrganizingData:TheNetworkModel
• Earlydatamanagementsystemsrepresenteddatausingsomethingcalledanetworkmodel.
• Dataarerepresentedbycollec8onsofrecords(todaywewouldcallthosekey/datapairs).
• Rela8onshipsamongrecordsareexpressedvialinksbetweenrecords(todaywecanthinkofthoselinksaspointers).
• Applica8onsinteractedwithdatabynaviga8ngthroughit:– Findarecord– Followlinks– Findotherrecords– Repeat
CS50forMBAs(2018) 5
TheNetworkModel:InsideRecords
• Recordscomposedofahributes(thinkfields)• Ahributesaresingle-valued.• Linksconnectexactlytworecords.• RepresentN-wayrela8onshipsvialinkrecords
CS50forMBAs(2018) 6
ID name phone
Pa8ent
date time
Appointment
name office
Doctor
Schedule
TheRela8onalModel:TheCompe88on
• TheNetworkModelhadsomeproblems– Applica8onshadtoknowthestructureofthedata– Changingtherepresenta8onrequiredamassiverewrite– Fundamentally:thephysicalarrangementwas8ghtlycoupledtotheapplica8onandtheapplica8onlogic.
• 1968:TedCoddproposestherela8onalmodel– Decouplephysicalrepresenta8onfromlogicalrepresenta8on
– Store“records”as“tables”– Replacelinkswithimplicitjoinsamongtables
• Thebigques8on:coulditperform?
CS50forMBAs(2018) 7
Outline• Inthebeginning…• TheheydayofRDBMS• Therebirthofkey/valuestores• Key/Valuestorestoday:NoSQL• NoSQL&key/valueusecases
CS50forMBAs(2018) 8
TheheydayofRDBMS
• AnerCodd’spaper,muchdebateensued,buttwogroupssetouttoturnanideaintosonware.– IBM:System/R– U.C.Berkeley:Ingres
• Bothwereresearchprojectstoexplorethefeasibilityofimplemen8ngtherela8onalmodel.
• Bothwerehugelysuccessfulandhadenormousimpact.
• However,attheircore,you’llno8cesomethinginteres8ng…
CS50forMBAs(2018) 9
TheDesignofSystemR
CS50forMBAs(2018) 10
ReliableStorageLayer(RSS)
DiskLayer
Read/writeblocks
SchemaandQueryLayer(RDS)
TheDesignofIngres
CS50forMBAs(2018) 11
UNIXFileSystem
DiskLayer
FileSystemcatalogs
FileDatabasetables
FileLog
UNIXfiles
Command-line terminal program
EQUELprogramsEQUEL
programsEQUELprogramsEQUEL
programs
TheKey/ValueStoreWithin
• Alltheserela8onalsystemsandmostdatamanagementsystemshavehiddendeepinsidesomesortofkey/valuestorageengine.
• Foryears,the“conven8onalwisdom”wastohidethoseKVstoresdeepunderneathaquerylanguageandschemalevel.
• Majorexcep8onwasCOBOL,whichcon8nuedtouseISAM.
CS50forMBAs(2018) 12
RDBMSmatured• Lotsofnewfeatures(SQL-86,SQL-89,SQL-92,SQL:1999,SQL:2003,SQL:2008,SQL:2100?)– Triggers– Storedprocedures– Reportgenerators– Rules– …
• Incorporatednewdatamodels:– Object-rela8onalsystems– XML
• Becameenormouslylarge,complexsystemsrequiringexpertadministra8on.
CS50forMBAs(2018) 13
RDBMS:TheGoodandtheBad• RDBMSAdvantages:
– Declara8vequerylanguagethatdecoupledphysicalandlogicallayout.
– (Rela8ve)easeofschemamodifica8on.– Greatsupportforadhocqueries.
• RDBMSDisadvantages:– Payfortheoverheadofqueryprocessingevenifyoudon’thavecomplexqueries.
– RequireDBAfortuningandmaintenance.– NearlyalwayspayIPCtoaccessdatabaseserver.– Requireschemadefini8on.– Notgreatatmanaginghierarchicalorothercomplexrela8onships.
CS50forMBAs(2018) 14
Outline• Inthebeginning…• TheheydayofRDBMS• Therebirthofkey/valuestores• Key/Valuestorestoday:NoSQL• NoSQL&key/valueusecases
CS50forMBAs(2018) 15
TheAgeoftheInternet
• Newclassesofapplica8onsemerged– Search– Authen8ca8on(LDAP)– Email– Browsing– Instantmessaging– Webservers– Onlineretail– Publickeymanagement
CS50forMBAs(2018) 16
TheseApplica8onswereDifferent• Theyweren’tsuppor8ngadhocquerymechanisms– Querysetspecifiedbytheapplica8on– Usersinteractedwiththeapplica8onsnotthedata
• Theyhadfairlysimpleschemas• Noneedforfancyreports• Performancewascri8cal
Allthebellsandwhistlesofthebigrela9onalplayersweren’tdeliveringnecessaryfunc9onality,butwere
cos9ngoverhead.
CS50forMBAs(2018) 17
TheEmergenceofKey/ValueStores
• In1997,SleepycatSonware,introducedthefirstcommercialkeyvaluestore(nowOracleBerkeleyDB).
• BerkeleyDBwas/is:– Embedded:linksdirectlyinanapplica8on’saddressspace
– Fast:avoidsqueryparsingandop8miza8on– Scalable:runsfromhandheldtodatacenter– Flexible:tradeofffunc8onalityandperformance– Reliable:recoversfromapplica8onandsystemfailure
CS50forMBAs(2018) 18
Key/ValueversusRDBMS
CS50forMBAs(2018) 19
Key-value Databases Standard RDBMS Pros • Sometimes the “good enough”
solution for simple data schemas • Much smaller footprint • Shorter execution path • More efficient, fewer OS resources • Usually no client/server • No application data mapping
• Rich SQL Support • Data typing • Data relationship management • Dynamic data relationships • Procedural languages (stored procedures) • Parallel query execution • Security, Encryption • Centralized management
Cons • Application-centric schema control • High Availability often missing (not
BDB) • Higher level abstraction APIs often
missing (not BDB) • Limited RDBMS integration
• Much larger engine, higher overhead, client/server overhead, general purpose RDBMS solution
• Higher administrative burden • Higher TCO
Fromlocaltodistributed
• Asthewebmatured,datavolumeandvelocityandcustomerdemandgrewexponen8ally.
• Exceededsinglesystemcapacity.• Enteraneweraofscalabilitydemands.• Infrastructuremigratesfromlarge-scaleSMPtoclustersofcommodityblades.
CS50forMBAs(2018) 20
Outline• Inthebeginning…• TheheydayofRDBMS• Therebirthofkey/valuestores• Key/Valuestorestoday:NoSQL
– Framing– Technology
• NoSQL&key/valueusecases
CS50forMBAs(2018) 21
EnterNoSQL
• Webserviceproviders(e.g.,Google,Amazon,Ebay,Facebook,Yahoo)beganpushingexis8ngdatamanagementsolu8onstotheirlimits.
• Introducedsharding:Splitthedataupacrossmul8plehoststospreadoutload.
• Introducedreplica8on:– Allowaccessfrommul8plesites– Increaseavailability
• Relaxconsistency:Didn’talwaysneedtransac8onalseman8cs
CS50forMBAs(2018) 22
WhatisNoSQL?
• Not-only-SQL(2009)• WhileRDBMshavetypicallyfocusedontransac8onsandtheACIDproper8es,NoSQLfocusesonBASE:– BasicAvailability:Usereplica8ontoreducethelikelihoodofdataunavailabilityandshardingtomakeanyremainingfailurespar8al
– Sonstate:Allowdatatobeinconsistentandrelegatedesigningaroundsuchinconsistenciestoapplica8ondevelopers.
– Eventuallyconsistent:Ensureonlythatatsomefuturepointin8methedataassumesaconsistentstate
CS50forMBAs(2018) 23
WhyNoSQL?• Threecommonproblems
1. Unprecedentedtransac8onvolumes2. Cri8calneedforlowlatencyaccess3. 100%Availability
• Hardwareshin:FrommassiveSMPtoblades• ChanginghowwedealwiththeCAPtheorem
– TheCAPtheoremstatesthatyoucan’thaveallthreeof:• Consistency:Allnodesseethesamedataatthesame8me.• Availability:Everyrequestreceivesasuccess/failresponse• Par88ontolerance:Thesystemcon8nuestooperatedespitearbitrarymessageloss.
– RDBMs(transac8onalsystems)typicallychooseCA– NoSQLtypicallychoosesAPorCP
CS50forMBAs(2018) 24
NoSQLEvolu8on• 1997:BerkeleyDBreleasedprovidingtransac8onalkey/valuestore.• 2001:BerkeleyDBintroducesreplica8onforhighavailability.• 2006:GooglepublishesChubbyandBigTablepapers.
– Eachsystemprovidesscalable,datamanagementforlooselycoupledsystems
• 2007:AmazonpublishesDynamopaper– DHT-basedeventuallyconsistentkey/valuestore
• 2008+:Mul8pleOSSprojectstoproduceBigTable/Dynamoknockoffs– HBase:ApacheHadoopprojectimplementa8onofBigTable(2008)– CouchDB:Erlang-baseddocumentstore,MVCCandversioning(2008)– Cassandra:Write-op8mized,column-oriented,secondaryindices
(2009)– MongoDB:JSON-baseddocumentstore,indexing,sharded(2009)
• 2009+:Commercializa8on– Companiesemergetosupportopensourceproducts(e.g.,DataStax/
Cassandra,Basho/Riak,Cloudera/HBase,10Gen/MongoDB)– Companiesdevelopcommercialofferings(Oracle,Citrusleaf)
CS50forMBAs(2018) 25
NoSQLTechnology
• Someformofdistribu8onandreplica8on:– Distributedhashtables(DHTs)– Keypar88oning– Replica8oninunderlyingstorageorinNoSQL
• Rela8onalorkey/valuestoreforunderlyingstorageengine.
• Secondgenera8onsystemsbuildingcustomwrite-op8mizedengines.– Log-structuredmergetrees(LSM)
CS50forMBAs(2018) 26
NoSQLDesignSpace
• Localnodestoragesystem• Distribu8on• DataModel• ConsistencyModel
– Eventualconsistency– Noconsistency– Transac8onalconsistency
CS50forMBAs(2018) 27
LocalStorageSystems
• Na8veKVstore– OracleNoSQLDatabase:BerkeleyDBJavaEdi8on– Basho:Riak’sBitcask,alog-structuredhashtable– Amazon:Dynamo,BDBDataStore,BDBJE(orMySQL)
• Log-StructuredMergeTrees– LevelDB
• Custom– BigTable(andclones):TabletserversonSSTablesexploi8ngGFS-likesystems.
CS50forMBAs(2018) 28
Distribu8on
• Threemainques8ons:– Howdoyoupar88ondata– Howmanycopiesdoyoukeep– Areallcopiesequal
• Howdoyoupar88ondata– Keypar88oning– Hashpar88oning(onencalledsharding)– Geographicpar88oning(alsocalledsharding)
CS50forMBAs(2018) 29
MoreDistribu8on
• Howmanycopiesandhow?– Usetheunderlyingfilesystem(GFS,HDFS)– Threeisgood;fiveisbeher;forhotthingssome8mesgotomany.
• CopyEquality:– Single-Master
• MongoDB,OracleNoSQLDatabase,
– Mul8-Master/Masterless• Riak,CouchDB,Couchbase
CS50forMBAs(2018) 30
NoSQLDataModels• Commontomostofferings:
– Denormaliza8on:somedatavaluesareduplicatedtoprovidesuperiorqueryperformance.
– Nojoins• Key/Value:Lihleornoschema,minimalrangequerysupport– OracleNoSQLDB,DynamoDB,Couchbase,Riak
• BigTableColumnFamily– Hbase,Cassandra
• Document-stores– MongoDB,CouchDB
CS50forMBAs(2018) 31
DataModel:Key/Value• Keysanddataarebothopaquebytestrings.• Designedforpointqueries
– Typicallynono8onofarangequery– Ifitera8onexists,it’susuallynotinkey-order
• Examples:– DynamoDB– LevelDB– Riak– Tokyo/KyotoCabinet– OracleNoSQLDatabase
• Extensions:– OracleNoSQL:Major/Minorkeys,batching– LevelDB:Atomicbatches
CS50forMBAs(2018) 32
DataModel:ColumnFamily(1)! Columnsaregroupedintocolumnfamilies.! Columnfamilies:
! Aretypicallystoredtogether! Canhavedifferentcolumnsforeachrow! Canhaveduplicateitemsinanycolumn
! Noschemaortypeenforcement! Alldataaretreatedasbytestrings
! Indexedbyrows! Rowsaregroupedintotablets! Nosecondaryindexes
CS50forMBAs(2018) 33
DataModel:ColumnFamily(2)
CS50forMBAs(2018) 34
Keys Name Address
1 FirstMargo
LastSeltzer
No3
StreetMillstoneLane
CityLincoln
StateMA
Zip01773
2000
No394
StreetEastRidingDr
CityCarlisle
StateMA
Zip01741
1993
PO65
CitySonyea
StateNY
Zip14556
1961
3 TitleLady
LastGaga
CityHollywood
StateCA
Zip90027
2008
4 LastMadonna
NewYork LosAngeles London 2000
ColumnFamiliesTimestamps
versions
DataModel:Document
• Key/Valuestorewherethevalueisadocumentwithstructure.
• Documentstoreunderstandsthestructureofdocuments:– JSON,XML,BSON,PDF,Doc
• Some8mesdocumentscanhavesub-documents.• Keyahributeofadocumentstoreisthatitletsyousearchwithindocumentsaswellassearchingfordocuments.
CS50forMBAs(2018) 35
NoSQLConsistency• TheCAPtheorem(EricBrewer):itisimpossibleforadistributedcomputer
systemtosimultaneouslyprovide:– Consistency:Allnodesseethesamedataatthesame8me– Availability:Everyrequestreceivesaresponseaboutwhetheritsucceeded– Par88ontolerance:Thesystemcon8nuestooperatedespitearbitrary
messagelossorfailureofpartsofthesystem• OriginallyposedasaconjecturebyBrewer,laterpreciselyprovenby
GilbertandLynch.• It’saninteres8nghistorytoread!• Implica8onsforNoSQL:Pick2
– SomesystemspickCP– OthersystemspickAP– FewsystemspickCA
• Why?
CS50forMBAs(2018) 36
FormsofConsistency• Consistent/Par88onableSystems
– BigTable,Hbase,HypterTable,MongoDB,Redis,MemcacheDB
• Available/Par88onableSystems– Cassandra,SimpleDB,CouchDB,Riak,TokyoCabinet,Dynamo
• EventualConsistency– Thereexistmul8pledefini8ons.ThemostpopularoneisduetoVogels:Thestoragesystem
guaranteesthatifnonewupdatesaremadetotheobject,eventually(anertheinconsistencywindowcloses)allaccesseswillreturnthelastupdatedvalue.
– YoucanbeAPandeventuallyconsistent.
• Variable:throughcarefulconfigura8onchoosehowconsistentyouwanttobe:– Cassandra:read/writetoone,quorum,all– OracleNoSQLDB:CanbeCPwhensewngackpolicyforsimplemajority,elseisAP.– AmazonDynamo:Pickingread/writequorumstradesoffperformanceanddegreeof
inconsistency
CS50forMBAs(2018) 37
Outline
• Inthebeginning…• TheheydayofRDBMS• Therebirthofkey/valuestores• Key/Valuestorestoday:NoSQL• NoSQL&key/valueusecases
CS50forMBAs(2018) 38
CassandraUseCase:NexlixAdrianCockcronhhp://www.hpts.ws/sessions/GlobalNexlixHPTS.pdf
• In2011Nexlixmovedfromcentralizeddatacenterstoanout-sourced,distributedcloud-basedsystem.
• ReplacedcentralizeddatabasewithCassandrafor:– Customers– Movies– History– Configura8on
• Why?– Noneedforschemachanges– Highperformance– Scalability
CS50forMBAs(2018) 39
k(%3$!g30!b%20!G2:23#'(!
/QE!Q(%;N!
b%20!G2:23#'(!
O1P#%J'(N!F'(J1#'!
/QE!
7;<&=8>&
FY!
4%C>%3'3$!F'(J1#'P!
C'C#2#+'0!
7?@&
42PP230(2!g4)!
E3$'(32:!O1P5P!
G2#5<>!
F1C>:'OG!
HBaseUseCase:FacebookMessagesKannanMuthukkaruppan:
hhp://www.hpts.ws/sessions/StorageInfraBehindMessages.pdf
• Facebookintegratedmessaging,chat,emailandSMSunderasinglenewmessageframework.
• Why?– Highwritethroughput– Goodreadperformance– Horizontalscalability– Strongconsistency
• WhatisinHbase?– Smallmessages– Messagemetadata– Searchindex
CS50forMBAs(2018) 40
MongoDBUseCase:ViperMediahhp://nosql.mypopescu.com/post/16058009985/mongodb-at-viber-media-the-plaxorm-enabling-free
• ViperusesMongoDBinacentralservicetowhichmobileclientsconnecttoroutemessagestootherclients.
• Why?– Scalability– Redundancy
• WhatisinMongoDB?– Variablelengthdocuments– Dic8onaryindexes
CS50forMBAs(2018) 41
WrappingUp• WhenshouldyoubeconsideringNoSQL?
– Scalability– Lowlatency– Redundancy– Noadhocqueries– Joinseasilyimplementableintheapplica8on– Straightforwardkeystructure
• AcoupleofsitesIfoundprehyinteres8ng:– ListofNoSQLDatabases:hhp://nosql-database.org– NoSQLDataModeling:
hhp://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/
– RecentPerformanceComparison:hhp://www.networkworld.com/cgi-bin/mailto/x.cgi?pagetosend=/news/tech/2012/102212-nosql-263595.html
CS50forMBAs(2018) 42
top related