using d for development of large scale primary storagedconf.org/2016/talks/zvibel.pdf · • 100%...

28
Using D for Development of Large Scale Primary Storage Liran Zvibel Weka.IO, CTO [email protected] @liranzvibel 1 #DConf2016

Upload: others

Post on 07-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using D for Development of Large Scale Primary Storagedconf.org/2016/talks/zvibel.pdf · • 100% CPU, polling based on networking and storage • Asynchronous programming model,

UsingDforDevelopmentofLargeScalePrimaryStorage

LiranZvibelWeka.IO,[email protected]@liranzvibel 1#DConf2016

Page 2: Using D for Development of Large Scale Primary Storagedconf.org/2016/talks/zvibel.pdf · • 100% CPU, polling based on networking and storage • Asynchronous programming model,

• Weka.IOIntroduction

• Ourprogresssincewepickedoff• ExampleswhereDreallyshines

• Ourchallenges• Improvementssuggestions

• Q&A

2

Agenda

DforPrimaryStorage#DConf2016

Page 3: Using D for Development of Large Scale Primary Storagedconf.org/2016/talks/zvibel.pdf · • 100% CPU, polling based on networking and storage • Asynchronous programming model,

3

Weka.IOIntroduction

Page 4: Using D for Development of Large Scale Primary Storagedconf.org/2016/talks/zvibel.pdf · • 100% CPU, polling based on networking and storage • Asynchronous programming model,

• Enablingcloudsandenterpriseswithasinglestoragesolutionforresilience,performance,scalabilityandcostefficiency

• HQinSanJose,CA;R&DinTelAviv,Israel• 30engineers,vaststorageexperience• VCbackedcompany;SeriesBledbyWaldenInternational;SeriesAledbyNorwestVenturePartners

• Productusedinproductionbyearlyadopters(stillinstealth)• Over200klocofourownDcode,about35packages

4

AboutWeka.IO

DforPrimaryStorage#DConf2016

Page 5: Using D for Development of Large Scale Primary Storagedconf.org/2016/talks/zvibel.pdf · • 100% CPU, polling based on networking and storage • Asynchronous programming model,

• Extremelyreliable,“alwayson”,state-full.

• Highperformancedatapath,measuredinµsecs

• Complicated“controlpath”/“managementcode”

• DistributednatureduetoHArequirements

• LowlevelinteractionwithHWdevices

• Somekernel-levelcode,someassembly

• Languagehastobeefficienttoprogram,andfitforlargeprojects

5

Storagesystemrequirements

DforPrimaryStorage#DConf2016

Page 6: Using D for Development of Large Scale Primary Storagedconf.org/2016/talks/zvibel.pdf · • 100% CPU, polling based on networking and storage • Asynchronous programming model,

• Softwareonlysolution• User-spaceprocesses• 100%CPU,pollingbasedonnetworkingandstorage• Asynchronousprogrammingmodel,usingFibersandaReactor

• Memoryefficient,zero-copyeverything,verylowlatency

• GCfree,lock-freeefficientdatastructures• ProprietarynetworkingstackfromEthernettoRPC

6

TheWeka.IOframework

DforPrimaryStorage#DConf2016

Page 7: Using D for Development of Large Scale Primary Storagedconf.org/2016/talks/zvibel.pdf · • 100% CPU, polling based on networking and storage • Asynchronous programming model,

7

OurProgress

Page 8: Using D for Development of Large Scale Primary Storagedconf.org/2016/talks/zvibel.pdf · • 100% CPU, polling based on networking and storage • Asynchronous programming model,

• Nomoreshow-stoppers,stillalongwaytogo• Indeedproductivityisveryhigh,verygoodcode-to-featuresratio• Weareableto“rapidprototype”featuresandthenironthem• Allmajorruntimeissuesresolved• Wegetgreatperformance

• ChoosingDwasagoodmove,andprovedtobeahugesuccess

8

CurrentstateforWeka

DforPrimaryStorage#DConf2016

Page 9: Using D for Development of Large Scale Primary Storagedconf.org/2016/talks/zvibel.pdf · • 100% CPU, polling based on networking and storage • Asynchronous programming model,

• SwitchedtoLDC(thanksDavidNadlingerandtheLDCteam!)

• Compilationisnowbypackage• BetterRAM“management”• Leveragingparallelismtospeedbuildtime

• Recentfront-ends“feel”muchmorestable

• LDCletsusbuildoptimizedcompilationwithasserts,whichisagoodthingforQA.

9

Compilationprogress

DforPrimaryStorage#DConf2016

Page 10: Using D for Development of Large Scale Primary Storagedconf.org/2016/talks/zvibel.pdf · • 100% CPU, polling based on networking and storage • Asynchronous programming model,

• Gotover100%performanceboostoverDMD• Whencompilingasasinglepackagewithoptimizations• Fiberswitchingbasedonregistersandnotpthreads• NoGCallocationwhenthrowingandhandlingexceptions(ThanksMithun!)• Integratelibunwindwithdwarfsupportforstacktraces(no--disable-fp-elim)• Supportdebug(-g)withbackendoptimizations• Templateinstantiationbug—stillunresolvedfortheupstream• @ldc.attribute.section(“SECTIONNAME”) • -staticflagtoldc,allowingeasycompileandshipmentofutilities

10

LDCstatus

DforPrimaryStorage#DConf2016

Page 11: Using D for Development of Large Scale Primary Storagedconf.org/2016/talks/zvibel.pdf · • 100% CPU, polling based on networking and storage • Asynchronous programming model,

• Wenowcheckhowmuchweallocated(usinghacks,apiwouldbenice)fromtheReactor,anddecidetocollectifweallocatedmorethan20MB

• Collectionactuallyhappensveryinfrequently(fewtimesinanhour)

• Collectiontimeisde-synchronizedacrossthecluster

• Collectiontimestillsignificant—about10ms

• Maindrawback—allocationMAYtake‘infinite’amountoftimeifkernelisstressedonmemory.

11

GCallocationandlatency

DforPrimaryStorage#DConf2016

Page 12: Using D for Development of Large Scale Primary Storagedconf.org/2016/talks/zvibel.pdf · • 100% CPU, polling based on networking and storage • Asynchronous programming model,

• ExceptionhandlingcodewasmodifiedtoneverrelyonGCallocation• ReactorandFiberscode(+ourTraceInfoclass)modifiedtokeepthetraceinafiberlocalstate.✴Problem:potentiallythrowingfromscope(exit/success/failure) • Throwablesareaclass,soallocatingthemcomesfromtheGC,mustbestaticallyallocated:• static __gshared auto ex = new Exception(“:o(”);

12

ExceptionsandGC

DforPrimaryStorage#DConf2016

Page 13: Using D for Development of Large Scale Primary Storagedconf.org/2016/talks/zvibel.pdf · • 100% CPU, polling based on networking and storage • Asynchronous programming model,

13

CodeTidbits

Page 14: Using D for Development of Large Scale Primary Storagedconf.org/2016/talks/zvibel.pdf · • 100% CPU, polling based on networking and storage • Asynchronous programming model,

• _genkeepsincrementingwhenbuffetsallocatedfrompools• Pointersremembertheirgenerations,andvalidateaccurateaccess• Helpsdebuggingstalepointers• problemwithimplicitcastsofnull,aliasthisisnotstrongenough.Maybesomesyntaxcouldhelp

14

NetworkBufferPtr

DforPrimaryStorage#DConf2016

@nogc @property inout(NetworkBuffer)* get() inout nothrow pure {autoptr=cast(NetworkBuffer*)(_addr>>MAGIC_BITS);assert(ptrisnull||(_addr&MAGIC_MASK)==ptr._gen);returnptr;}

aliasgetthis;

Page 15: Using D for Development of Large Scale Primary Storagedconf.org/2016/talks/zvibel.pdf · • 100% CPU, polling based on networking and storage • Asynchronous programming model,

switch (pkt.header.type) { foreach(name; __traits(allMembers, PacketType)) {     case __traits(getMember, PacketType, name):         return __traits(getMember, this, "handle" ~ name)(pkt);}

15

Handlingallenumvalues

DforPrimaryStorage#DConf2016

• SimilarsolutionverifiesallfieldsinaCstructhavethesameoffset,naturallytheCpartendsupbeingmuchmorecomplex.

Page 16: Using D for Development of Large Scale Primary Storagedconf.org/2016/talks/zvibel.pdf · • 100% CPU, polling based on networking and storage • Asynchronous programming model,

@propertyboolflag(stringNAME)(){return(_flags&__traits(getMember,NBFlags,NAME))!=0;}@propertyvoidflag(stringNAME)(boolval){if(val){_flags|=__traits(getMember,NBFlags,NAME);}else{_flags&=~__traits(getMember,NBFlags,NAME);}}buffer.flag!"TX_ACK"=true;

16

Flagsetting/testing

DforPrimaryStorage#DConf2016

Page 17: Using D for Development of Large Scale Primary Storagedconf.org/2016/talks/zvibel.pdf · • 100% CPU, polling based on networking and storage • Asynchronous programming model,

staticif(JoinedKV.sizeof<=CACHE_LINE_SIZE){aliasKV=JoinedKV;enumseparateKV=false;}else{structKV{Kkey;/*valueswillbestoredseparatelyforbettercachebehavior*/}V[NumEntries]values;enumseparateKV=true;}

17

Efficientpacking

DforPrimaryStorage#DConf2016

Page 18: Using D for Development of Large Scale Primary Storagedconf.org/2016/talks/zvibel.pdf · • 100% CPU, polling based on networking and storage • Asynchronous programming model,

18

Challenges

Page 19: Using D for Development of Large Scale Primary Storagedconf.org/2016/talks/zvibel.pdf · • 100% CPU, polling based on networking and storage • Asynchronous programming model,

• Projectisbrokeninto~35packages.• Somelogicalpackagesarecompiledasseveralsmallerpackages• Current2.0.68.2compilerhasseveralpackagescompiledabout90seconds,leadingtototalcompiletimeof4-5minutes.• Newer2.070.2+PGOcompilerreducestimebyabout35%(ThanksJohan!).Stillgetting3-4minutespercompletecompile.

19

Compilationtime

DforPrimaryStorage#DConf2016

Page 20: Using D for Development of Large Scale Primary Storagedconf.org/2016/talks/zvibel.pdf · • 100% CPU, polling based on networking and storage • Asynchronous programming model,

• Introducemoreparallelismintothebuildprocess• Supportincrementalcompiles.• Nowwhenadependencyischanged,completepackageshavetobecompletelyrebuilt.Inmanycases,mostoftheworkisredundant

• WhendependencyIMPLEMENTATIONischanged,stilleverythinggetsrecompiled

• Support(centralized)cachingforbuildresults.

• Don'tlethumans“contextswitch”whilewaitingforthecompiler!

20

Compiletimeimprovementsuggestions

DforPrimaryStorage#DConf2016

Page 21: Using D for Development of Large Scale Primary Storagedconf.org/2016/talks/zvibel.pdf · • 100% CPU, polling based on networking and storage • Asynchronous programming model,

• Totalsymbols:99649,over1k:9639,over500k:102,over1M:62• Longestsymbolwas5M!

• Makesworkingwithstandardtoolsmuchharder(somenmtoolscrashontheexe).• Asimplehashingsolutionwasimplementedinourspecialcompiler• Demanglingnowstoppedworkingforus,weonlygetmodule/funcname

• Moretimeisspentonhashingthanwhatissavedonlinkage.Wemayneeda“native”solution.

21

LongSymbols

DforPrimaryStorage#DConf2016

Page 22: Using D for Development of Large Scale Primary Storagedconf.org/2016/talks/zvibel.pdf · • 100% CPU, polling based on networking and storage • Asynchronous programming model,

privatestructMapResult(aliasfun,Range,ARGS…){ARGS_args;aliasR=Unqual!Range;R_input;this(Rinput,ARGSargs){_input=input;_args=args;}@propertyautoreffront(){returnfun(_input.front,_args);}…autounder_value_gc(R)(Rr,intvalue){returnr.filter!(x=>x<value);}

autounder_value_nogc(R)(Rr,intvalue){returnr.xfilter!((x,y)=>x<y)(value);}

automultiple_by_gc(R)(Rr,intvalue){returnr.map!(x=>x*value);}

automultiple_by_nogc(R)(Rr,intvalue){returnr.xmap!((x,y)=>x*y)(value);}

22

PhobosAlgsForcingGC

DforPrimaryStorage#DConf2016

Page 23: Using D for Development of Large Scale Primary Storagedconf.org/2016/talks/zvibel.pdf · • 100% CPU, polling based on networking and storage • Asynchronous programming model,

23

ImprovementIdeas

Page 24: Using D for Development of Large Scale Primary Storagedconf.org/2016/talks/zvibel.pdf · • 100% CPU, polling based on networking and storage • Asynchronous programming model,

• Makeitexplicit• Allowittomanipulatetypes,toreplacecomplextemplaterecursion

24

static foreach

DforPrimaryStorage#DConf2016

templatehasUDAttributeOfType(T,aliasX){aliasattrs=TypeTuple!(__traits(getAttributes,X));

templatehelper(inti){staticif(i>=attrs.length){enumhelper=false;}elsestaticif(is(attrs[i]==T)||is(typeof(attrs[i])==T)){staticassert(!helper!(i+1),"Morethanonematchingattribute:"~attrs.stringof);enumhelper=true;}else{enumhelper=helper!(i+1);}}enumhasUDAttributeOfType=helper!0;}

Page 25: Using D for Development of Large Scale Primary Storagedconf.org/2016/talks/zvibel.pdf · • 100% CPU, polling based on networking and storage • Asynchronous programming model,

• Specifysome@UDAsastransitive,sohecompilercanhelp“prove”correctness.• Forexample:• Definefunctionas@atomicifitdoesnotcontextswitch• Functionmaybe@atomicifitonlycalls@atomicfunctions• Nextstepwouldbetoprovethatnocontextswitchhappens• Canbeimplementedin“runtime”ifthereisa__traitsthatreturnsallthefunctionsthatafunctionmaycall.• Nextphasewouldbetobeableto‘prove’thingsonthefunctions,so@nogc,nothrow,pureetccanusethesamemechanism.

25

Transitive@UDA

DforPrimaryStorage#DConf2016

Page 26: Using D for Development of Large Scale Primary Storagedconf.org/2016/talks/zvibel.pdf · • 100% CPU, polling based on networking and storage • Asynchronous programming model,

• __traitsthatreturnsthatmaxstacksizeofafunction

• Addapredicatethattellswhetherthereisanexistingexceptioncurrentlyhandled

• DonateWeka’s@nogc‘standardlibrary’toPhobos:• OurFiberadditionsintoPhobos(throwInFiber,TraceInfosupport,etc)[otherlibfunsaswell]

• Containers,algorithms,locklessdatastructures,etc…

26

OtherSuggestions

DforPrimaryStorage#DConf2016

Page 27: Using D for Development of Large Scale Primary Storagedconf.org/2016/talks/zvibel.pdf · • 100% CPU, polling based on networking and storage • Asynchronous programming model,

Questions?PetaExaZettaYottaXennaWeka(1030)

Page 28: Using D for Development of Large Scale Primary Storagedconf.org/2016/talks/zvibel.pdf · • 100% CPU, polling based on networking and storage • Asynchronous programming model,

DforPrimaryStorage#DConf2016 28

Table 1

0.68 0.70 0.70 + PGO88.4 58.1 54.784.3 57.1 54.775.8 51.0 49.767.5 40.7 37.359.5 36.4 43.156.6 38.3 35.351.9 35.9 32.450.4 30.9 34.644.9 25.2 26.842.7 31.0 27.035.7 31.3 30.235.1 24.5 22.931.4 21.1 17.730.5 20.5 19.925.8 20.1 16.319.0 13.5 15.418.3 12.0 11.314.3 10.6 10.413.7 14.0 13.69.4 6.8 6.3

• 2.0.70.2isamajorimprovementincompiletimeoverthe2.068.2• Still,the30-40%improvementmeanthatengineershavetowaitlongminutestogetthewholeexetobuild.

• We’rebreakinglargepackageintosmallerones,whenpossible