abstraction and performance in database systems · dsls are hot! • motivation: not declarativity...

24
Abstraction and Performance in Database Systems Christoph Koch EPFL DATA Lab

Upload: others

Post on 27-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • AbstractionandPerformanceinDatabaseSystems

    ChristophKochEPFLDATALab

  • Contents•  Expressivenessvs.efficientevaluationofdeclarative

    languages–  HowGeorgshapedmeandthistalk

    •  Domain-specificlanguagesarehotacrosscomputerscience–  DSLsvsdeclarativelanguages

    •  EpidemiologyofDatabasePeopleMissingBoatsDisorder(DMBD)–  InDBsystems:TheScalabilityBlunder:NoSQL–  InDBsystems:HowDSLsmakeDBperformanceworkmainstream…andfolklore.

    –  InDBtheory:WherearethePODSpeopleintheDSLrevolution?•  Opportunities:Non-TuringcompleteDSLs&FMT•  WhatIdo

  • MycollaborationwithGeorg•  20+jointpapersonexpressiveness,

    complexity,andefficientevaluationofdeclarative/querylanguages.

    •  ThingsIlearnedfromGeorg:–  Howtodoresearch,really–  HowtowriteaPODSpaperJ–  Usingdeclarativelanguagescreatively–  Expressivenessvs.complexityisnota

    zero-sumgame!–  Onecan’tjustwritepapersandhavea

    careerhere,butadvancehumanknowledge!

    –  Muchmore

  • Theyears0to13AG•  Ididmoreworkondeclarativelanguages

    –  E.g.forprobabilisticdatabasesandvideogames•  Imovedmoreintosystems

    •  HowcouldIcombinedeclarativelanguages,expressiveness/efficiencywithsystems?– Domain-specificlanguages– Databasesandcompilation/codegenerationforperformance.

    •  ThisiswhatIcurrentlymostlydo.

  • DeclarativelanguagesandDSLs•  Domain-specificlanguages(DSLs)

    –  Engineeredlanguages–  UsuallyTuring-complete–  EmbeddedDSL:classicalPL(e.g.Java)+library(domain-specificvocabulary)

    •  SQLisaDSL(domain=databasequerying)–  ButmostnewDSLsarenotverydeclarative.

    •  InTuring-completeDSLs:(compiler)optimizationstendtobelocalandsometimesbrittle.

  • DSLsarehot!•  Motivation:notdeclarativitybutperformance

    –  CompensateforthefailureofDennardscalingandMoore’slaw.

    – Wedon’tknowhowtobuildrobustoptimizingcompilerswithdeep/globaloptimizations.

    –  Consequence:Domain-specificcompilation–opportunitiesforautomaticsoftwarespecialization.

    •  PeoplealloverCSareflockingtoDSLs–  Computerarchitecture.ASPLOS;Chisel,…– HPC&Graphics:OpenGL,Halide,…–  Systems,databases:LegoBase,S-Store…

  • DSLsandcodegeneration•  Softwarespecializationbycompilation.

    •  Staging/partialevaluation(e.g.specializeDBMScodeforagivenschema).

    •  DSLcompilerframeworksallowtoeasilyadddomain-specificcodeoptimizations.•  Usageindomainmakesthemrobust.•  Squid:github.com/epfldata/squid[Parreaux,Shaikhha,K.,GPCE2017,

    Scala2017,POPL2018]

    •  Increasingly,DSLsenablecodegenerationthatmatchesoroutperformshumansystemsprogrammingexperts!•  Observedinmultipledomains,e.g.lineartransforms[Spiral],OLAP

    [LegoBase],OLTP[S-Store]•  “AbstractionwithoutRegret”[Rompf&Odersky,CACM;K.,

    CIDR2013]

  • S-StoreTPC-Cbenchmarkresults

    8

    OLTPX

    Dashti,John,K.,2014

  • DSLsandtheroleofdatabaseresearch

    •  Relationaldatabasescreatedmanyfirsts.•  SQLisstillthemostsuccessfulDSL•  RDBMSshowshowtobuildanentiresystem,theentirestack,for

    executingSQLefficiently.•  Algebras,planlanguages,cost-basedoptimization,logicalvs.physical

    datarepresentation;managingthememoryhierarchy,memhierarchy-awareoperatorimplementation.

    –  ThebasicpipelineandarchitectureisthefoundationofallmodernDSL-basedsystems.

    –  Somecreditisgiven(e.g.GraphLab),butthedatabasecontribsareincreasinglytakenasahistoricalfootnoteacrossCS.•  Also,arewestillinnovatinginanysignificantway?

  • DSLsandtheroleofdatabaseresearch

    Databaseperformancetechniquesarebecomingmainstream… andtheroleofdatabasesfadesaway.Intwoways:

    – ThecontributionsoftheDBcommunityarebecomingahistoricalfootnote.

    •  Databaseideasstopbeingconsidereddatabaseideas.– Databasesfunctionalityisintegratedintootherkindsofsystems,andclassicalDBMSwillbeusedinfewerscenarios.

  • Example1:row/columnarrepresentations

    •  Muchhyped(M.Stonebraker).VariousDBMSbuilt– Vertica,SAPHana,…

    •  But:It’sCSfolklorenow.•  Ubiquituousinprogrammingtools

    –  List:n+1objects–  Pair:3objects– MakesahugeperformancedifferenceinOOruntimesystems,e.g.JVM–boxing/unboxingoverheads!!!

    •  HeavilyusedinHPC,graphics,ML,…

  • Example2:GRACEHashjoin

    •  Classicaldatabasecoursematerial.Seemsuniquelyaboutdatabases(?)

    •  Main-memDBcase:hashjoinbecomesthetrivalimplementation.

    •  GRACEhash-join=mainmemhashjoin+stagingforthememhierarchy.

    •  Memhierarchyconsiderationshavebynowbeenbetteranalyzed/addressedbythecompilers,computerarchitectureandHPCcommunities.–  general/automaticalgotransformationtechniquesexist(looptiling&superoptimization;seeAhoetal.DragonBook2ndEd.Chapter11)

  • Acaseofmissingtheboat

    •  IsthereanythingaboutDBPerformancethatwon’tbeabsorbedintotheCSsystems/performancemainstream?

    •  Conjecture:No.•  ExperienceintheDBLabproject(github.com/epfldata/

    dblab)[Shaikhha,…,K.,VDLB2014,SIGMOD2016,TODS2018,JFP2018].–  Wearebuildingalibraryofcompileroptimizationsfordata-intensivesystems,byabstractingfromadatabasesystem(LegoBase).

    –  Aftercleaningup,noneseemreallyspecifictodatabases.

    •  Thisisaproblemforthefutureofdatabaseresearch.

  • DatabasePeopleMissingBoatsDisorder(DMBD)–apandemic?

    •  Causes:– LackofcaretorecognizemajorCStrends(early)– Lackofefforttoabstract&generalizeresults– Cateringtoomuchtoreviewersinacalcified&brokensystemofconferences.

    •  Symptoms:Rectalpain,depression•  Treatment:???

  • AnothercaseofMtBinDBsystems:NoSQL

    •  Therealwayswasdistributedandparalleldatabasesresearch.–  Bannedfromfirst-ratepublicationvenues–  Fewsystemsbuilt–not“sexy”enough.

    •  ThenGoogleandFacebookwantedscalabledatabases,andwecouldn’tofferthem.

    •  Consequencestoday:–  Amassivelossofprestigeforourcommunity–  Awidely-heldbeliefthatonehastolookforSOSPrather

    thanSIGMODforgoodDBresearch.–  GenuinecontributionsoftheDBcommunitydonotget

    acknowledgedandcited,butreinvented.

  • AthirdMtBcase:DBTheory

    •  Estimated#ofPODSpaperstalkingofDSLs,ever:0

    •  Pub.venuesforfoundationalDSLwork:POPL,SIGGRAPH,ASPLOS,…– Citationin-degreeintoDBtheoryliterature:~0

  • Opportunities

    •  ManyresultsfromDBTheory,finitemodeltheoryonnon-TuringcompletelanguagescarryovertomodernDSLs.

    •  Peopleinotherdomainsdonotknowtheseresultsandfindthemexciting,whenappliedtotheirDSL.

    •  E.g.collectionprogramminglanguageslikeSparkareessentiallyjustnestedrelationalalgebra…

    •  MyexperienceataDSLsummerschool.

  • QuizFrom:K,“ExploitingDomain-SpecificKnowledge:[…]Part1:LessonsonDSLslearned

    bytheDBcommunity”,DSLDesign&ImplementationSummerSchool,2016.ConsiderthefollowingDSL:•  purelyfunctionalScala,with“if”astheonlycontrolstructure•  TypesbuiltfromInt,List,andtuples•  Listops:singletonconstr,emptylist,map(x=>…),flatten,listconcat++•  Tupleconstruction(…)andprojection_i•  (deep)equalitytest=;theidentityfunctionLetuscallthislanguage(Scala/List)MonadCalculus(MC)tohavealabel.Example:scala> val R = List(1)++List(2); val S = List(1)++List(3)R: List[Int] = List(1, 2)S: List[Int] = List(1, 3)

    scala> R.map(r => S.map(s => 
 if (r==s) List((r,s)) else List()).flatten).flattenres2: List[(Int, Int)] = List((1,1))

    for(r

  • Quiz:WhatcanyoudoinMC?

    R.map(r => S.map(s => 
 if (r==s) List((r,s)) else List()).flatten).flatten

    •  Joins? ---yes•  Arbitrary“conjunctivequeries” ---yes•  ArbitrarySQLselect-from-wherequeries ---no,conditions( x==a) == List()•  Aggregations:selectcount(*)from… ---no•  Testingonorder/looksideways,sortingalistofintegers? ---no•  Reachabilityinagraphgivenbytheedgerelation? ---no

  • Quiz:WhatcanyoudoinMC?

    R.map(r => S.map(s => 
 if (r==s) List((r,s)) else List()).flatten).flatten

    •  Doeseveryprogramterminate? ---yes•  Howbigisthelargestvaluethancanbeproduced? ---polynomialininput•  Howquicklydoeseveryprogterminate? ---PTIME•  Allqueriesofrelationalalgebra ---yes!!!•  Onlyqueriesexpressibleinrelationalalgebra ---yes%repr!!!!!!!!!!•  Caneveryprogrambeparallelized? ---yes,fantastically

    well!(AC0)--givenpolynomiallymuchhardware,everyprogramrunsinCONSTANTtime!!!!--ifyouhaveonlyconstantlymuchhardware=>BrentSchedulingPrinciple

  • Quiz:ExtendingMCR.map(r => S.map(s => 
 if (r==s) List((r,s)) else List()).flatten).flatten

    •  Testingonorder/looksideways,sortingalistofintegers? ---no•  List.mappreservesorderbutcan’t“query”it.•  ButwhatifIwantaDSLthatcandothis?CouldaddList.foldLeft,andnothingelse.•  Doeseveryprogramstillterminate?---yes•  DoeseveryprogramstillruninPTIME?---no,nonelementary!

  • TheFO[X]DSLZoo

  • Databasetheoryworkthatweneedmoreof

    1.  Resultsoncomplexityandefficiencythatsystemspeoplecanunderstandtoberelevanttothem,andwhichcarryovertonewlanguages,e.g.– Georg’sworkonhypertreedecompositions– Resultcardinalitybounds–AGMbound– Worst-caseoptimaljoins– …

    2.  ResultsthatbridgethegapbetweenPODSandPOPL/SIGGRAPH/ASPLOSwork.

  • Summary

    •  TrynottomisstheDSLboat.

    •  Ifthisadviceisusefultoyou,youultimatelyhaveGeorgtothankforitJ