ruby hacking guide
TRANSCRIPT
RubyHackingGuide
Preface
Thisbookexploresseveralthemeswiththefollowinggoalsinmind:
TohaveknowledgeofthestructureofrubyTogainknowledgeaboutlanguageprocessingsystemsingeneralToacquireskillsinreadingsourcecode
Rubyisanobject-orientedlanguagedevelopedbyYukihiroMatsumoto.TheofficialimplementationoftheRubylanguageiscalledruby.Itisactivelydevelopedandmaintainedbytheopensourcecommunity.Ourfirstgoalistounderstandtheinner-workingsoftherubyimplementation.Thisbookisgoingtoinvestigaterubyasawhole.
Secondly,byknowingabouttheimplementationofRuby,wewillbeabletoknowaboutotherlanguageprocessingsystems.Itriedtocoveralltopicsnecessaryforimplementingalanguage,suchashashtable,scannerandparser,evaluationprocedure,andmanyothers.Becausethisbookisnotintendedasatextbook,goingthroughentireareasandideaswithoutanylackwasnot
reasonable.Howeverthepartsrelatingtotheessentialstructuresofalanguageimplementationareadequatelyexplained.AndabriefsummaryofRubylanguageitselfisalsoincludedsothatreaderswhodon’tknowaboutRubycanreadthisbook.
Themainthemesofthisbookarethefirstandthesecondpointabove.Though,whatIwanttoemphasizethemostisthethirdone:Toacquireskillinreadingsourcecode.Idaretosayit’sa“hidden”theme.IwillexplainwhyIthoughtitisnecessary.
Itisoftensaid“Tobeaskilledprogrammer,youshouldreadsourcecodewrittenbyothers.”Thisiscertainlytrue.ButIhaven’tfoundabookthatexplainshowyoucanactuallydoit.TherearemanybooksthatexplainOSkernelsandtheinterioroflanguageprocessingsystemsbyshowingtheconcretestructureor“theanswer,”buttheydon’texplainthewaytoreachthatanswer.It’sclearlyone-sided.
Canyou,perhaps,naturallyreadcodejustbecauseyouknowhowtowriteaprogram?Isittruethatreadingcodesissoeasythatallpeopleinthisworldcanreadcodewrittenbyotherswithnosweat?Idon’tthinkso.Readingprogramsiscertainlyasdifficultaswritingprograms.
Therefore,thisbookdoesnotsimplyexplainrubyassomethingalreadyknown,ratherdemonstratetheanalyzingprocessasgraphicaspossible.ThoughIthinkI’mareasonablyseasonedRubyprogrammer,IdidnotfullyunderstandtheinnerstructureofrubyatthetimewhenIstartedtowritethisbook.Inotherwords,
regardingthecontentofruby,Istartedfromthepositionascloseaspossibletoreaders.Thisbookisthesummaryofboththeanalyzingprocessstartedfromthatpointanditsresult.
IaskedYukihiroMatsumoto,theauthorofruby,forsupervision.ButIthoughtthespiritofthisbookwouldbelostifeachanalysiswasmonitoredbytheauthorofthelanguagehimself.ThereforeIlimitedhisreviewtothefinalstageofwriting.Inthisway,withoutloosingthesenseofactuallyreadingthesourcecodes,IthinkIcouldalsoassurethecorrectnessofthecontents.
Tobehonest,thisbookisnoteasy.Intheveryleast,itislimitedinitssimplicitybytheinherentcomplexityofitsaim.However,thiscomplexitymaybewhatmakesthebookinterestingtoyou.Doyoufinditinterestingtobechatteringaroundapieceofcake?Doyoutaketoyourdesktosolveapuzzlethatyouknowtheanswertoinaheartbeat?Howaboutasuspensenovelwhosecriminalyoucanguesshalfwaythrough?Ifyoureallywanttocometonewknowledge,youneedtosolveaproblemengagingallyourcapacities.Thisisthebookthatletsyoupracticesuchidealismexhaustively.“It’sinterestingbecauseit’sdifficult.”I’mgladifthenumberofpeoplewhothinksowillincreasebecauseofthisbook.
Targetaudience
Firstly,knowledgeabouttheRubylanguageisn’trequired.
However,sincetheknowledgeoftheRubylanguageisabsolutelynecessarytounderstandcertainexplanationsofitsstructure,supplementaryexplanationsofthelanguageareinsertedhereandthere.
KnowledgeabouttheClanguageisrequired,tosomeextent.Iassumeyoucanallocatesomestructswithmalloc()atruntimetocreatealistorastackandyouhaveexperienceofusingfunctionpointersatleastafewtimes.
Also,sincethebasicsofobject-orientedprogrammingwillnotbeexplainedsoseriously,withouthavinganyexperienceofusingatleastoneofobject-orientedlanguages,youwillprobablyhaveadifficulttime.Inthisbook,ItriedtousemanyexamplesinJavaandC++.
Structureofthisbook
Thisbookhasfourmainparts:
Part1:ObjectsPart2:SyntacticanalysisPart3:EvaluationPart4:Peripheralaroundtheevaluator
Supplementarychaptersareincludedatthebeginningofeachpartwhennecessary.Theseprovideabasicintroductionforthosewho
arenotfamiliarwithRubyandthegeneralmechanismofalanguageprocessingsystem.
Now,wearegoingthroughtheoverviewofthefourmainparts.Thesymbolinparenthesesaftertheexplanationindicatesthedifficultygauge.Theyare(C),(B),(A)inorderofeasytohard,(S)beingthehighest.
Part1:ObjectChapter1 FocusesthebasicsofRubytogetreadytoaccomplishPart1.(C)Chapter2 GivesconcreteinnerstructureofRubyobjects.(C)Chapter3 Statesabouthashtable.(C)
Chapter4WritesaboutRubyclasssystem.Youmayreadthroughthischapterquicklyatfirst,becauseittellsplentyofabstractstories.(A)
Chapter5Showsthegarbagecollectorwhichisresponsibleforgeneratingandreleasingobjects.Thefirststoryinlow-levelseries.(B)
Chapter6Describestheimplementationofglobalvariables,classvariables,andconstants.(C)Chapter7 OutlineofthesecurityfeaturesofRuby.(C)
Part2:SyntacticanalysisChapter8 TalksaboutalmostcompletespecificationoftheRuby
language,inordertoprepareforPart2andPart3.(C)
Chapter9 Introductiontoyaccrequiredtoreadthesyntaxfileatleast.(B)
Chapter10 Lookthroughtherulesandphysicalstructureoftheparser.(A)
Chapter11Explorearoundtheperipheralsoflex_state,whichisthemostdifficultpartoftheparser.Themostdifficultpartofthisbook.(S)
Chapter12 FinalizationofPart2andconnectiontoPart3.(C)
Part3:EvaluatorChapter13 Describethebasicmechanismoftheevaluator.(C)
Chapter14 ReadstheevaluationstackthatcreatesthemaincontextofRuby.(A)Chapter15 Talksaboutsearchandinitializationofmethods.(B)
Chapter16Defiestheimplementationoftheiterator,themostcharacteristicfeatureofRuby.(A)Chapter17 Describetheimplementationoftheevalmethods.(B)
Part4:PeripheralaroundtheevaluatorChapter18 Run-timeloadingoflibrariesinCandRuby.(B)
Chapter19 Describestheimplementationofthreadattheendofthecorepart.(A)
Environment
Thisbookdescribesonruby1.7.32002-09-12version.It’sattachedontheCD-ROM.Chooseanyoneofruby-rhg.tar.gz,ruby-rhg.lzh,orruby-rhg.zipaccordingtoyourconvenience.Contentisthesameforall.Alternativelyyoucanobtainfromthesupportsite(footnote{http://i.loveruby.net/ja/rhg/})ofthisbook.
Forthepublicationofthisbook,thefollowingbuildenvironmentwaspreparedforconfirmationofcompilingandtestingthebasicoperation.Thedetailsofthisbuildtestaregivenindoc/buildtest.htmlintheattachedCD-ROM.However,itdoesn’tnecessarilyassumetheprobabilityoftheexecutionevenunderthesameenvironmentlistedinthetable.Theauthordoesn’tguaranteeinanyformtheexecutionofruby.
BeOS5PersonalEdition/i386DebianGNU/Linuxpotato/i386DebianGNU/Linuxwoody/i386DebianGNU/Linuxsid/i386FreeBSD4.4-RELEASE/Alpha(Requiresthelocalpatchforthisbook)FreeBSD4.5-RELEASE/i386FreeBSD4.5-RELEASE/PC98FreeBSD5-CURRENT/i386HP-UX10.20HP-UX11.00(32bitmode)HP-UX11.11(32bitmode)MacOSX10.2NetBSD1.6F/i386OpenBSD3.1PlamoLinux2.0/i386LinuxforPlayStation2Release1.0RedhatLinux7.3/i386Solaris2.6/SparcSolaris8/Sparc
UX/4800VineLinux2.1.5VineLinux2.5VineSeedWindows98SE(Cygwin,MinGW+Cygwin,MinGW+MSYS)WindowsMe(BorlandC++Compiler5.5,Cygwin,MinGW+Cygwin,MinGW+MSYS,VisualC++6)WindowsNT4.0(Cygwin,MinGW+Cygwin)Windows2000(BorlandC++Compiler5.5,VisualC++6,VisualC++.NET)WindowsXP(VisualC++.NET,MinGW+Cygwin)
Thesenumeroustestsaren’tofaloneeffortbytheauthor.Thosetestbuildcouldn’tbeachievedwithoutmagnificentcooperationsbythepeoplelistedbelow.
I’dliketoextendwarmestthanksfrommyheart.
TietewkjananyasusakazukiMasahiroSatoKenichiTamuraMorikyuYuyaKatoYasuhiroKuboKentaroGotoTomoyukiShimomura
MasakiSukedaKojiAraiKazuhiroNishiyamaShinyaKawajiTetsuyaWatanabeNaokuniFujimoto
However,theauthorowestheresponsibilityforthistest.Pleaserefrainfromattemptingtocontactthesepeopledirectly.Ifthere’sanyflawinexecution,pleasebeadvisedtocontacttheauthorbye-mail:[email protected].
Website
Thewebsiteforthisbookishttp://i.loveruby.net/ja/rhg/.Iwilladdinformationaboutrelatedprogramsandadditionaldocumentation,aswellaserrata.Inaddition,I’mgoingtopublisizethefirstfewchaptersofthisbookatthesametimeoftherelease.Iwilllookforacertaincircumstancetopublicizemorechapters,andthewholecontentsofthebookwillbeatthiswebsiteattheend.
Acknowledgment
Firstofall,IwouldliketothankMr.YukihiroMatsumoto.Heis
theauthorofRuby,andhemadeitinpublicasanopensourcesoftware.Notonlyhewillinglyapprovedmetopublishabookaboutanalyzingruby,butalsoheagreedtosupervisethecontentofit.Inaddition,hehelpedmystayinFloridawithsimultaneoustranslation.ThereareplentyofthingsbeyondenumerationIhavetosaythankstohim.Insteadofwritingallthethings,Igivethisbooktohim.
Next,Iwouldliketothankarton,whoproposedmetopublishthisbook.Thewordsofartonalwaysmovesme.OneofthethingsI’mcurrentlystruggledduetohiswordsisthatIhavenoreasonIdon’tgeta.NETmachine.
KojiArai,the‘captain’ofdocumentationintheRubysociety,conductedascrutinyreviewasifhebecametheofficialeditorofthisbookwhileIwasnottoldso.Ithankallhisreview.
AlsoI’dliketomentionthosewhogavemecomments,pointedoutmistakesandsubmittedproposalsabouttheconstructionofthebookthroughoutallmywork.
Tietew,Yuya,Kawaji,Gotoken,Tamura,Funaba,Morikyu,Ishizuka,Shimomura,Kubo,Sukeda,Nishiyama,Fujimoto,Yanagawa,(I’msorryifthere’sanypeoplemissing),Ithankallthosepeoplecontributed.
Asafinalnote,IthankOtsuka,Haruta,andKanemitsuwhoyouforarrangingeverythingdespitemybrokedeadlineasmuchasfourtimes,andthatthemanuscriptexceeded200pagesthan
originallyplanned.
Icannotexpandthefulllistheretomentionthenameofallpeoplecontributedtothisbook,butIsaythatIcouldn’tsuccessfullypublishthisbookwithoutsuchassistance.Letmetakethisplacetoexpressmyappreciation.Thankyouverymuch.
MineroAoki
Ifyouwanttosendremarks,suggestionsandreportsoftypographcalerrors,pleaseaddresstoMineroAoki<[email protected]>.
“Rubyソースコード完全解説”canbereserved/orderedatImpressDirect.(Jumptotheintroductionpage)
Copyright©2002-2004MineroAoki,Allrightsreserved.
TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera
CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License
RubyHackingGuide
Introduction
CharacteristicsofRuby
SomeofthereadersmayhavealreadybeenfamiliarwithRuby,but(Ihope)therearealsomanyreaderswhohavenot.Firstlet’sgothougharoughsummaryofthecharacteristicsofRubyforsuchpeople.
Hereaftercapital“Ruby”referstoRubyasalanguagespecification,andlowercase“ruby”referstorubycommandasanimplementation.
DevelopmentstyleRubyisalanguagethatisbeingdeveloppedbythehandofYukihiroMatsumotoasanindividual.UnlikeCorJavaorScheme,itdoesnothaveanystandard.Thespecificationismerelyshownasanimplementationasruby,anditsvaryingcontinuously.Forgood
orbad,it’sfree.
Furthermorerubyitselfisafreesoftware.It’sprobablynecessarytomentionatleastthetwopointshere:Thesourcecodeisopeninpublicanddistributedfreeofcharge.Thankstosuchcondition,anattemptlikethisbookcanbeapproved.
Ifyou’dliketoknowtheexactlisence,youcanreadREADMEandLEGAL.Forthetimebeing,I’dlikeyoutorememberthatyoucandoatleastthefollowingthings:
YoucanredistributesourcecodeofrubyYoucanmodifysourcecodeofrubyYoucanredistributeacopyofsourcecodewithyourmodification
Thereisnoneedforspecialpermissionandpaymentinallthesecases.
Bytheway,thepurposeofthisbookistoreadtheoriginalruby,thusthetargetsourceistheonenotmodifiedunlessitisparticularyspecified.However,whitespaces,newlinesandcommentswereaddedorremovedwithoutasking.
It’sconservativeRubyisaveryconservativelanguage.Itisequippedwithonlycarefullychosenfeaturesthathavebeentestedandwashedoutinavarietyoflanguages.Thereforeitdoesn’thaveplentyoffreshand
experimentalfeaturesverymuch.Soithasatendencytoappealtoprogrammerswhoputimportanceonpracticalfunctionalities.Thedyed-in-the-woolhackerslikeSchemeandHaskellloversdon’tseemtofindappealinruby,atleastinashortglance.
Thelibraryisconservativeinthesameway.Clearandunabbreviatednamesaregivenfornewfunctions,whilenamesthatappearsinCandPerllibrarieshavebeentakenfromthem.Forexample,printf,getpwent,sub,andtr.
Itisalsoconservativeinimplementation.Assemblerisnotitsoptionforseekingspeed.Portabilityisalwaysconsideredahigherprioritywhenitconflictswithspeed.
Itisanobject-orientedlanguageRubyisanobject-orientedlanguage.ItisabsolutelyimpossibletoexcludeitfromthefeaturesofRuby.
Iwillnotgiveapagetothisbookaboutwhatanobject-orientedlanguageis.Totellaboutanobject-orientedfeatureaboutRuby,theexpressionofthecodethatjustgoingtobeexplainedistheexactsample.
ItisascriptlanguageRubyisascriptlanguage.ItseemsalsoabsolutelyimpossibletoexcludethisfromthefeaturesofRuby.Togainagreementofeveryone,anintroductionofRubymustinclude“object-oriented”
and“scriptlanguage”.
However,whatisa“scriptlanguage”forexample?Icouldn’tfigureoutthedefinitionsuccessfully.Forexample,JohnK.Ousterhout,theauthorofTcl/Tk,givesadefinitionas“executablelanguageusing#!onUNIX”.Thereareotherdefinitionsdependingontheviewpoints,suchasonethatcanexpressausefulprogramwithonlyoneline,orthatcanexecutethecodebypassingaprogramfilefromthecommandline,etc.
However,Idaretouseanotherdefinition,becauseIdon’tfindmuchinterestin“what”ascriptlanguage.Ihavetheonlyonemeasuretodecidetocallitascriptlanguage,thatis,whethernoonewouldcomplainaboutcallingitascriptlanguage.Tofulfillthisdefinition,Iwoulddefinethemeaningof“scriptlanguage”asfollows.
Alanguagethatitsauthorcallsita“scriptlanguage”.
I’msurethisdefinitionwillhavenofailure.AndRubyfulfillsthispoint.ThereforeIcallRubya“scriptlanguage”.
It’saninterpreterrubyisaninterpreter.That’sthefact.Butwhyit’saninterpreter?Forexample,couldn’titbemadeasacompiler?Itmustbebecauseinsomepointsbeinganinterpreterisbetterthanbeingacompiler…atleastforruby,itmustbebetter.Well,whatisgoodaboutbeinganinterpreter?
Asapreparationsteptoinvestigatingintoit,let’sstartbythinkingaboutthedifferencebetweenaninterpreterandacompiler.Ifthematteristoattemptatheoreticalcomparisonintheprocesshowaprogramisexecuted,there’snodifferencebetweenaninterpreterlanguageandacompilelanguage.BecauseitworksbylettingCPUinterpretthecodecompiledtothemachinelanguage,itmaybepossibletosayitworksasaninterpretor.Thenwhereistheplacethatactuallymakesadifference?Itisamorepracticalplace,intheprocessofdevelopment.
Iknowsomebody,assoonashearing“intheprocessofdevelopment”,wouldclaimusingastereotypicalphrase,thataninterpreterreduceseffortofcompilationthatmakesthedevelopmentprocedureeasier.ButIdon’tthinkit’saccurate.Alanguagecouldpossiblybeplannedsothatitwon’tshowtheprocessofcompilation.Actually,DelphicancompileaprojectbyhittingjustF5.Aclaimaboutalongtimeforcompilationisderivedfromthesizeoftheprojectoroptimizationofthecodes.Compilationitselfdoesn’toweanegativeside.
Well,whypeopleperceiveaninterpreterandcompilersomuchdifferentlikethis?Ithinkthatitisbecausethelanguagedeveloperssofarhavechoseneitherimplementationbasedonthetraitofeachlanguage.Inotherwords,ifitisalanguageforacomparativelysmallpurposesuchasadailyroutine,itwouldbeaninterpretor.Ifitisforalargeprojectwhereanumberofpeopleareinvolvedinthedevelopmentandaccuracyisrequired,itwouldbeacompiler.Thatmaybebecauseofthespeed,aswellastheeaseof
creatingalanguage.
Therefore,Ithink“it’shandybecauseit’saninterpreter”isanoutsizedmyth.Beinganinterpreterdoesn’tnecessarilycontributethereadinessinusage;seekingreadinessinusagenaturallymakesyourpathtowardbuildinganinterpreterlanguage.
Anyway,rubyisaninterpreter;ithasanimportantfactaboutwherethisbookisfacing,soIemphasizeithereagain.ThoughIdon’tknowabout“it’shandybecauseitisaninterpreter”,anywayrubyisimplementedasaninterpreter.
HighportabilityEvenwithaproblemthatfundamentallytheinterfacesareUnix-centered,Iwouldinsistrubypossessesahighportability.Itdoesn’trequireanyextremelyunfamiliarlibrary.Ithasonlyafewpartswritteninassembler.Thereforeportingtoanewplatformiscomparativelyeasy.Namely,itworksonthefollowingplatformscurrently.
LinuxWin32(Windows95,98,Me,NT,2000,XP)CygwindjgppFreeBSDNetBSDOpenBSD
BSD/OSMacOSXSolarisTru64UNIXHP-UXAIXVMSUX/4800BeOSOS/2(emx)Psion
IheardthatthemainmachineoftheauthorMatsumotoisLinux.ThuswhenusingLinux,youwillnotfailtocompileanytime.
Furthermore,youcanexpectastablefunctionalityona(typical)Unixenvironment.Consideringthereleasecycleofpackages,theprimaryoptionfortheenvironmenttohitaroundrubyshouldfallonabranchofPCUNIX,currently.
Ontheotherhand,theWin32environmenttendstocauseproblemsdefinitely.ThelargegapsinthetargetingOSmodeltendtocauseproblemsaroundthemachinestackandthelinker.Yet,recentlyWindowshackershavecontributedtomakebettersupport.IuseanativerubyonWindows2000andMe.Onceitgetssuccessfullyrun,itdoesn’tseemtoshowspecialconcernslikefrequentcrashing.ThemainproblemsonWindowsmaybethegapsinthespecifications.
AnothertypeofOSthatmanypeoplemaybeinterestedinshouldprobablybeMacOS(priortov9)andhandheldOSlikePalm.
Aroundruby1.2andbefore,itsupportedlegacyMacOS,butthedevelopmentseemstobeinsuspension.Evenacompilingcan’tgetthrough.ThebiggestcauseisthatthecompilerenvironmentoflegacyMacOSandthedecreaseofdevelopers.TalkingaboutMacOSX,there’snoworriesbecausethebodyisUNIX.
ThereseemtobediscussionstheportabilitytoPalmseveralbranches,butIhaveneverheardofasuccessfulproject.Iguessthedifficultyliesinthenecessityofsettlingdownthespecification-levelstandardssuchasstdioonthePalmplatform,ratherthantheprocessesofactualimplementation.WellIsawaportingtoPsionhasbeendone.([ruby-list:36028]).
HowabouthotstoriesaboutVMseeninJavaand.NET?BecauseI’dliketotalkaboutthemcombiningtogetherwiththeimplementation,thistopicwillbeinthefinalchapter.
AutomaticmemorycontrolFunctionallyit’scalledGC,orGarbageCollection.SayingitinC-language,thisfeatureallowsyoutoskipfree()aftermalloc().Unusedmemoryisdetectedbythesystemautomatically,andwillbereleased.It’ssoconvenientthatonceyougetusedtoGCyouwon’tbewillingtodosuchmanualmemorycontrolagain.
ThetopicsaboutGChavebeencommonbecauseofitspopularity
inrecentlanguageswithGCasastandardset,anditisfunthatitsalgorithmscanstillbeimprovedfurther.
TypelessvariablesThevariablesinRubydon’thavetypes.Thereasonisprobablytypelessvariablesconformsmorewithpolymorphism,whichisoneofthestrongestadvantagesofanobject-orientedlanguage.Ofcoursealanguagewithvariabletypehasawaytodealwithpolymorphism.WhatImeanhereisatypelessvariableshavebetterconformance.
Thelevelof“betterconformance”inthiscasereferstosynonymslike“handy”.It’ssometimescorrespondstocrucialimportance,sometimesitdoesn’tmatterpractically.Yet,thisiscertainlyanappealingpointifalanguageseeksfor“handyandeasy”,andRubydoes.
MostofsyntacticelementsareexpressionsThistopicisprobablydifficulttounderstandinstantlywithoutalittlesupplementalexplanation.Forexample,thefollowingC-languageprogramresultsinasyntacticerror.
result=if(cond){process(val);}else{0;}
BecausetheC-languagesyntaxdefinesifasastatement.Butyou
canwriteitasfollows.
result=cond?process(val):0;
Thisrewriteispossiblebecausetheconditionaloperator(a?b:c)isdefinedasanexpression.
Ontheotherhand,inRuby,youcanwriteasfollowsbecauseifisanexpression.
result=ifcondthenprocess(val)elsenilend
Roughlyspeaking,ifitcanbeanargumentofafunctionoramethod,youcanconsideritasanexpression.
Ofcourse,thereareotherlanguageswhosesyntacticelementsaremostlyexpressions.Lispisthebestexample.Becauseofthecharacteristicaroundthis,thereseemsmanypeoplewhofeellike“RubyissimilartoLisp”.
IteratorsRubyhasiterators.Whatisaniterator?Beforegettingintoiterators,Ishouldmentionthenecessityofusinganalternativeterm,becausetheword“iterator”isdislikedrecently.However,Idon’thaveagoodalternative.Soletuskeepcallingit“iterator”forthetimebeing.
Wellagain,whatisaniterator?Ifyouknowhigher-orderfunction,forthetimebeing,youcanregarditassomethingsimilartoit.InC-language,thecounterpartwouldbepassingafunctionpointerasanargument.InC++,itwouldbeamethodtowhichtheoperationpartofSTL’sIteratorisenclosed.IfyouknowshorPerl,it’sgoodtoimaginesomethinglikeacustomforstatementwhichwecandefine.
Yet,theabovearemerelyexamplesof“similar”concepts.Allofthemaresimilar,buttheyarenotidenticaltoRuby’siterator.Iwillexpandtheprecisestorywhenit’sagoodtimelater.
WritteninC-languageBeingwritteninC-languageisnotnotablethesedays,butit’sstillacharacteristicforsure.AtleastitisnotwritteninHaskellorPL/I,thusthere’sthehighpossibilitythattheordinarypeoplecanreadit.(Whetheritistrulyso,I’dlikeyouconfirmitbyyourself.)
Well,Ijustsaidit’sinC-language,buttheactuallanguageversionwhichrubyistargettingisbasicallyK&RC.Untilalittlewhileago,therewereadecentnumberof–notplentythough–K&R-only-environment.Butrecently,thereareafewenvironmentswhichdonotacceptprogramswritteninANSIC,technicallythere’snoproblemtomoveontoANSIC.However,alsobecauseoftheauthorMatsumoto’spersonalpreference,itisstillwritteninK&Rstyle.
Forthisreason,thefunctiondefinitionisallinK&Rstyle,andtheprototypedeclarationsarenotsoseriouslywritten.Ifyoucarelesslyspecify-Walloptionofgcc,therewouldbeplentyofwarningsshown.IfyoutrytocompileitwithaC++compiler,itwouldwarnprototypemismatchandcouldnotcompile.…Thesekindofstoriesareoftenreportedtothemailinglist.
ExtensionlibraryWecanwriteaRubylibraryinCandloaditatruntimewithoutrecompilingRuby.Thistypeoflibraryiscalled“Rubyextensionlibrary”orjust“Extensionlibrary”.
NotonlythefactthatwecanwriteitinC,buttheverysmalldifferenceinthecodeexpressionbetweenRuby-levelandC-levelisalsoasignificanttrait.AsfortheoperationsavailableinRuby,wecanalsousetheminCinthealmostsameway.Seethefollowingexample.
#Methodcallobj.method(arg)#Rubyrb_funcall(obj,rb_intern("method"),1,arg);#C
#Blockcallyieldarg#Rubyrb_yield(arg);#C
#RaisingexceptionraiseArgumentError,'wrongnumberofarguments'#Rubyrb_raise(rb_eArgError,"wrongnumberofarguments");#C
#Generatinganobject
arr=Array.new#RubyVALUEarr=rb_ary_new();#C
It’sgoodbecauseitprovideseasinessincomposinganextensionlibrary,andactuallyitmakesanindispensableprominenceofruby.However,it’salsoaburdenforrubyimplementation.Youcanseetheaffectsofitinmanyplaces.TheaffectstoGCandthread-processingiseminent.
ThreadRubyisequippedwiththread.Assumingaveryfewpeopleknowingnoneaboutthreadthesedays,Iwillomitanexplanationaboutthethreaditself.Iwillstartastoryindetail.
ruby’sthreadisauser-levelthreadthatisoriginallywritten.Thecharacteristicofthisimplementationisaveryhighportabilityinbothspecificationandimplementation.SurprisinglyaMS-DOScanrunthethread.Furthermore,youcanexpectthesameresponseinanyenvironment.Manypeoplementionthatthispointisthebestfeatureofruby.
However,asatradeoffforsuchanextremenessofportability,rubyabandonsthespeed.It’s,say,probablytheslowestofalluser-levelthreadimplementationsinthisworld.Thetendencyofrubyimplementationmaybeseenherethemostclearly.
Techniquetoreadsourcecode
Well.Afteranintroductionofruby,weareabouttostartreadingsourcecode.Butwait.
Anyprogrammerhastoreadasourcecodesomewhere,butIguesstherearenotmanyoccasionsthatsomeoneteachesyoutheconcretewayshowtoread.Why?Doesitmeanyoucannaturallyreadaprogramifyoucanwriteaprogram?
ButIcan’tthinkreadingtheprogramwrittenbyotherpeopleissoeasy.Inthesamewayaswritingprograms,theremustbetechniquesandtheoriesinreadingprograms.Andtheyarenecessary.Therefore,beforestartingtoreadyruby,I’dliketoexpandageneralsummaryofanapproachyouneedtotakeinreadingasourcecode.
PrinciplesAtfirst,Imentiontheprinciple.
DecideagoalAnimportantkeytoreadingthesourcecodeistosetaconcretegoal.
ThisisawordbytheauthorofRuby,Matsumoto.Indeed,hiswordisveryconvincingforme.Whenthemotivationisaspontaneous
idea“MaybeIshouldreadakernel,atleast…”,youwouldgetsourcecodeexpandedorexplanatorybooksreadyonthedesk.Butnotknowingwhattodo,thestudiesaretobeleftuntouched.Haven’tyou?Ontheotherhand,whenyouhaveinmind“I’msurethereisabugsomewhereinthistool.Ineedtoquicklyfixitandmakeitwork.OtherwiseIwillnotbeabletomakethedeadline…”,youwillprobablybeabletofixthecodeinablink,evenifit’swrittenbysomeoneelse.Haven’tyou?
Thedifferenceinthesetwocasesismotivationyouhave.Inordertoknowsomething,youatleasthavetoknowwhatyouwanttoknow.Therefore,thefirststepofallistofigureoutwhatyouwanttoknowinexplicitwords.
However,ofcoursethisisnotallneededtomakeityourown“technique”.Because“technique”needstobeacommonmethodthatanybodycanmakeuseofitbyfollowingit.Inthefollowingsection,Iwillexplainhowtobringthefirststepintothelandingplacewhereyouachievethegoalfinally.
VisualisingthegoalNowletussupposethatourfinalgoalisset“Understandallaboutruby”.Thisiscertainlyconsideredas“onesetgoal”,butapparentlyitwillnotbeusefulforreadingthesourcecodeactually.Itwillnotbeatriggerofanyconcreteaction.Therefore,yourfirstjobwillbetodragdownthevaguegoaltothelevelofaconcretething.
Thenhowcanwedoit?Thefirstwayisthinkingasifyouarethe
personwhowrotetheprogram.Youcanutilizeyourknowledgeinwritingaprogram,inthiscase.Forexample,whenyouarereadingatraditional“structured”programmingbysomebody,youwillanalyzeithiringthestrategiesofstructuredprogrammingtoo.Thatis,youwilldividethetargetintopieces,littlebylittle.IfitissomethingcirculatinginaeventloopsuchasaGUIprogram,firstroughlybrowsetheeventloopthentrytofindouttheroleofeacheventhandler.Or,trytoinvestigatethe“M”ofMVC(ModelViewController)first.
Second,it’sgoodtobeawareofthemethodtoanalyze.Everybodymighthavecertainanalysismethods,buttheyareoftendonerelyingonexperienceorintuition.Inwhatwaycanwereadsourcecodeswell?Thinkingaboutthewayitselfandbeingawareofitarecruciallyimportant.
Well,whataresuchmethodslike?Iwillexplainitinthenextsection.
AnalysismethodsThemethodstoreadsourcecodecanberoughlydividedintotwo;oneisastaticmethodandtheotherisdynamicmethod.Staticmethodistoreadandanalyzethesourcecodewithoutrunningtheprogram.Dynamicmethodistowatchtheactualbehaviorusingtoolslikeadebugger.
It’sbettertostartstudyingaprogrambydynamicanalysis.Thatis
becausewhatyoucanseethereisthe“fact”.Theresultsfromstaticanalysis,duetothefactofnotrunningtheprogramactually,maywellbe“prediction”toagreaterorlesserextent.Ifyouwanttoknowthetruth,youshouldstartfromwatchingthefact.
Ofcourse,youdon’tknowwhethertheresultsofdynamicanalysisarethefactreally.Thedebuggercouldrunwithabug,ortheCPUmaynotbeworkingproperlyduetooverheat.Theconditionsofyourconfigurationcouldbewrong.However,theresultsofstaticanalysisshouldatleastbeclosertothefactthandynamicanalysis.
Dynamicanalysis
UsingthetargetprogramYoucan’tstartwithoutthetargetprogram.Firstofall,youneedtoknowinadvancewhattheprogramislike,andwhatareexpectedbehaviors.
FollowingthebehaviorusingthedebuggerIfyouwanttoseethepathsofcodeexecutionandthedatastructureproducedasaresult,it’squickertolookattheresultbyrunningtheprogramactuallythantoemulatethebehaviorinyourbrain.Inordertodosoeasily,usethedebugger.
Iwouldbemorehappyifthedatastructureatruntimecanbeseen
asapicture,butunfortunatelywecannearlyscarcelyfindatoolforthatpurpose(especiallyfewtoolsareavailableforfree).Ifitisaboutasnapshotofthecomparativelysimplerstructure,wemightbeabletowriteitoutasatextandconvertittoapicturebyusingatoollikegraphviz\footnote{graphviz……Seedoc/graphviz.htmlintheattachedCD-ROM}.Butit’sverydifficulttofindawayforgeneralpurposeandrealtimeanalysis.
TracerYoucanusethetracerifyouwanttotracetheproceduresthatcodegoesthrough.IncaseofC-language,thereisatoolnamedctrace\footnote{ctrace……http://www.vicente.org/ctrace}.Fortracingasystemcall,youcanusetoolslikestrace\footnote{strace……http://www.wi.leidenuniv.nl/~wichert/strace/},truss,andktrace.
PrinteverywhereThereisaword“printfdebugging”.Thismethodalsoworksforanalysisotherthandebugging.Ifyouarewatchingthehistoryofonevariable,forexample,itmaybeeasiertounderstandtolookatthedumpoftheresultoftheprintstatementsembed,thantotrackthevariablewithadebugger.
ModifyingthecodeandrunningitSayforexample,intheplacewhereit’snoteasytounderstandits
behavior,justmakeasmallchangeinsomepartofthecodeoraparticularparameterandthenre-runtheprogram.Naturallyitwouldchangethebehavior,thusyouwouldbeabletoinferthemeaningofthecodefromit.
Itgoeswithoutsaying,youshouldalsohaveanoriginalbinaryanddothesamethingonbothofthem.
Staticanalysis
TheimportanceofnamesStaticanalysisissimplysourcecodeanalysis.Andsourcecodeanalysisisreallyananalysisofnames.Filenames,functionnames,variablenames,typenames,membernames—Aprogramisabunchofnames.
Thismayseemobviousbecauseoneofthemostpowerfultoolsforcreatingabstractionsinprogrammingisnaming,butkeepingthisinmindwillmakereadingmuchmoreefficient.
Also,we’dliketoknowaboutcodingrulesbeforehandtosomeextent.Forexample,inClanguage,externfunctionoftenusesprefixtodistinguishthetypeoffunctions.Andinobject-orientedprograms,functionnamessometimescontaintheinformationaboutwheretheybelongtoinprefixes,anditbecomesvaluableinformation(e.g.rb_str_length).
ReadingdocumentsSometimesadocumentdescribestheinternalstructureisincluded.EspeciallybecarefulofafilenamedHACKINGetc.
ReadingthedirectorystructureLookingatinwhatpolicythedirectoriesaredivided.Graspingtheoverviewsuchashowtheprogramisstructured,andwhatthepartsare.
ReadingthefilestructureWhilebrowsing(thenamesof)thefunctions,alsolookingatthepolicyofhowthefilesaredivided.Youshouldpayattentiontothefilenamesbecausetheyarelikecommentswhoselifetimeisverylong.
Additionally,ifafilecontainssomemodulesinit,foreachmodulethefunctionstocomposeitshouldbegroupedtogether,soyoucanfindoutthemodulestructurefromtheorderofthefunctions.
InvestigatingabbreviationsAsyouencounterambiguousabbreviations,makealistofthemandinvestigateeachofthemasearlyaspossible.Forexample,whenitiswritten“GC”,thingswillbeverydifferentdependingonwhetheritmeans“GarbageCollection”or“GraphicContext”.
Abbreviationsforaprogramaregenerallymadebythemethodsliketakingtheinitiallettersordroppingthevowels.Especially,popularabbreviationsinthefieldsofthetargetprogramareusedunconditionally,thusyoushouldbefamiliarwiththematanearlystage.
UnderstandingdatastructureIfyoufindbothdataandcode,youshouldfirstinvestigatethedatastructure.Inotherwords,whenexploringcodeinC,it’sbettertostartwithheaderfiles.Andinthiscase,let’smakethemostofourimaginationfromtheirfilenames.Forexample,ifyoufindframe.h,itwouldprobablybethestackframedefinition.
Also,youcanunderstandmanythingsfromthemembernamesofastructandtheirtypes.Forexample,ifyoufindthemembernext,whichpointstoitsowntype,thenitwillbealinkedlist.Similarly,whenyoufindmemberssuchasparent,children,andsibling,thenitmustbeatreestructure.Whenprev,itwillbeastack.
UnderstandingthecallingrelationshipbetweenfunctionsAfternames,thenextmostimportantthingtounderstandistherelationshipsbetweenfunctions.Atooltovisualizethecallingrelationshipsisespeciallycalleda“callgraph”,andthisisveryuseful.Forthis,we’dliketoutilizetools.
Atext-basedtoolissufficient,butit’sevenbetterifatoolcangeneratediagrams.Howeversuchtoolisseldomavailable(especiallyfewtoolsareforfree).WhenIanalyzedrubytowritethisbook,IwroteasmallcommandlanguageandaparserinRubyandgenerateddiagramshalf-automaticallybypassingtheresultstothetoolnamedgraphviz.
ReadingfunctionsReadinghowitworkstobeabletoexplainthingsdonebythefunctionconcisely.It’sgoodtoreaditpartbypartaslookingatthefigureofthefunctionrelationships.
Whatisimportantwhenreadingfunctionsisnot“whattoread”but“whatnottoread”.Theeaseofreadingisdecidedbyhowmuchwecancutoutthecodes.Whatshouldexactlybecutout?Itishardtounderstandwithoutseeingtheactualexample,thusitwillbeexplainedinthemainpart.
Additionally,whenyoudon’tlikeitscodingstyle,youcanconvertitbyusingthetoollikeindent.
ExperimentingbymodifyingitasyoulikeIt’samysteryofhumanbody,whensomethingisdoneusingalotofpartsofyourbody,itcaneasilypersistinyourmemory.Ithinkthereasonwhynotafewpeoplepreferusingmanuscriptpaperstoakeyboardisnotonlytheyarejustnostalgicbutsuchfactisalso
related.
Therefore,becausemerelyreadingonamonitorisveryineffectivetorememberwithourbodies,rewriteitwhilereading.Thiswayoftenhelpsourbodiesgetusedtothecoderelativelysoon.Iftherearenamesorcodeyoudon’tlike,rewritethem.Ifthere’sacrypticabbreviation,substituteitsothatitwouldbenolongerabbreviated.
However,itgoeswithoutsayingbutyoushouldalsokeeptheoriginalsourceasideandchecktheoriginalonewhenyouthinkitdoesnotmakesensealongtheway.Otherwise,youwouldbewonderingforhoursbecauseofasimpleyourownmistake.Andsincethepurposeofrewritingisgettingusedtoandnotrewritingitself,pleasebecarefulnottobeenthusiasticverymuch.
ReadingthehistoryAprogramoftencomeswithadocumentwhichisaboutthehistoryofchanges.Forexample,ifitisasoftwareofGNU,there’salwaysafilenamedChangeLog.Thisisthebestresourcetoknowabout“thereasonwhytheprogramisasitis”.
Alternatively,whenaversioncontrolsystemlikeCVSorSCCSisusedandyoucanaccessit,itsutilityvalueishigherthanChangeLog.TakingCVSasanexample,cvsannotate,whichdisplaystheplacewhichmodifiedaparticularline,andcvsdiff,whichtakesdifferencefromthespecifiedversion,andsoonareconvenient.
Moreover,inthecasewhenthere’samailinglistoranewsgroupfordevelopers,youshouldgetthearchivessothatyoucansearchoverthemanytimebecauseoftenthere’stheinformationabouttheexactreasonofacertainchange.Ofcourse,ifyoucansearchonline,it’salsosufficient.
ThetoolsforstaticanalysisSincevarioustoolsareavailableforvariouspurposes,Ican’tdescribethemasawhole.ButifIhavetochooseonlyoneofthem,I’drecommendglobal.Themostattractivepointisthatitsstructureallowsustoeasilyuseitfortheotherpurposes.Forinstance,gctags,whichcomeswithit,isactuallyatooltocreatetagfiles,butyoucanuseittocreatealistofthefunctionnamescontainedinafile.
~/src/ruby%gctagsclass.c|awk'{print$1}'SPECIAL_SINGLETONSPECIAL_SINGLETONclone_methodinclude_class_newins_methods_iins_methods_priv_iins_methods_prot_imethod_list::
Thatsaid,butthisisjustarecommendationofthisauthor,youasareadercanusewhichevertoolyoulike.Butinthatcase,youshouldchooseatoolequippedwithatleastthefollowingfeatures.
listupthefunctionnamescontainedinafilefindthelocationfromafunctionnameoravariablename(It’smorepreferableifyoucanjumptothelocation)functioncross-reference
Build
TargetversionTheversionofrubydescribedinthisbookis1.7(2002-09-12).Regardingruby,itisastableversionifitsminorversionisanevennumber,anditisadevelopingversionifitisanoddnumber.Hence,1.7isadevelopingversion.Moreover,9/12doesnotindicateanyparticularperiod,thusthisversionisnotdistributedasanofficialpackage.Therefore,inordertogetthisversion,youcangetfromtheCD-ROMattachedtothisbookorthesupportsite\footnote{Thesupportsiteofthisbook……http://i.loveruby.net/ja/rhg/}oryouneedtousetheCVSwhichwillbedescribedlater.
Therearesomereasonswhyitisnot1.6,whichisthestableversion,but1.7.Onethingisthat,becauseboththespecificationandtheimplementationareorganized,1.7iseasiertodealwith.Secondly,it’seasiertouseCVSifitistheedgeofthedevelopingversion.Additionally,itislikelythat1.8,whichisthenextstableversion,willbeoutinthenearfuture.Andthelastoneis,
investigatingtheedgewouldmakeourmoodmorepleasant.
GettingthesourcecodeThearchiveofthetargetversionisincludedintheattachedCD-ROM.InthetopdirectoryoftheCD-ROM,
ruby-rhg.tar.gzruby-rhg.zipruby-rhg.lzh
thesethreeversionsareplaced,soI’dlikeyoutousewhicheveronethatisconvenientforyou.Ofcourse,whicheveroneyouchoose,thecontentisthesame.Forexample,thearchiveoftar.gzcanbeextractedasfollows.
~/src%mount/mnt/cdrom~/src%gzip-dc/mnt/cdrom/ruby-rhg.tar.gz|tarxf-~/src%umount/mnt/cdrom
CompilingJustbylookingatthesourcecode,youcan“read”it.Butinordertoknowabouttheprogram,youneedtoactuallyuseit,remodelitandexperimentwithit.Whenexperimenting,there’snomeaningifyoudidn’tusethesameversionyouarelookingat,thusnaturallyyou’dneedtocompileitbyyourself.
Therefore,fromnowon,I’llexplainhowtocompile.First,let’sstartwiththecaseofUnix-likeOS.There’sseveralthingsto
consideronWindows,soitwillbedescribedinthenextsectionaltogether.However,CygwinisonWindowsbutalmostUnix,thusI’dlikeyoutoreadthissectionforit.
BuildingonaUnix-likeOSWhenitisaUnix-likeOS,becausegenerallyitisequippedwithaCcompiler,byfollowingthebelowprocedures,itcanpassinmostcases.Letussuppose~/src/rubyistheplacewherethesourcecodeisextracted.
~/src/ruby%./configure~/src/ruby%make~/src/ruby%su~/src/ruby#makeinstall
Below,I’lldescribeseveralpointstobecarefulabout.
OnsomeplatformslikeCygwin,UX/4800,youneedtospecifythe--enable-sharedoptionatthephaseofconfigure,oryou’dfailtolink.--enable-sharedisanoptiontoputthemostofrubyoutofthecommandassharedlibraries(libruby.so).
~/src/ruby%./configure--enable-shared
Thedetailedtutorialaboutbuildingisincludedindoc/build.htmloftheattachedCD-ROM,I’dlikeyoutotryasreadingit.
BuildingonWindows
Ifthethingistobuildonwindows,itbecomeswaycomplicated.Thesourceoftheproblemis,therearemultiplebuildingenvironments.
VisualC++MinGWCygwinBorlandC++Compiler
First,theconditionoftheCygwinenvironmentisclosertoUNIXthanWindows,youcanfollowthebuildingproceduresforUnix-likeOS.
Ifyou’dliketocompilewithVisualC++,VisualC++5.0andlaterisrequired.There’sprobablynoproblemifitisversion6or.NET.
MinGWorMinimalistGNUforWindows,itiswhattheGNUcompilingenvironment(Namely,gccandbinutils)isportedonWindows.CygwinportsthewholeUNIXenvironment.Onthecontrary,MinGWportsonlythetoolstocompile.Moreover,aprogramcompiledwithMinGWdoesnotrequireanyspecialDLLatruntime.Itmeans,therubycompiledwithMinGWcanbetreatedcompletelythesameastheVisualC++version.
Alternatively,ifitispersonaluse,youcandownloadtheversion5.5ofBorlandC++CompilerforfreefromthesiteofBoarland.\footnote{TheBorlandsite:http://www.borland.co.jp}Becauserubystartedtosupportthisenvironmentfairlyrecently,there’smoreorlessanxiety,buttherewasnotanyparticularproblemonthebuild
testdonebeforethepublicationofthisbook.
Then,amongtheabovefourenvironments,whichoneshouldwechoose?First,basicallytheVisualC++versionisthemostunlikelytocauseaproblem,thusIrecommendit.IfyouhaveexperiencedwithUNIX,installingthewholeCygwinandusingitisgood.IfyouhavenotexperiencedwithUNIXandyoudon’thaveVisualC++,usingMinGWisprobablygood.
Below,I’llexplainhowtobuildwithVisualC++andMinGW,butonlyabouttheoutlines.FormoredetailedexplanationsandhowtobuildwithBorlandC++Compiler,theyareincludedindoc/build.htmloftheattachedCD-ROM,thusI’dlikeyoutocheckitwhenitisnecessary.
VisualC++ItissaidVisualC++,butusuallyIDEisnotused,we’llbuildfromDOSprompt.Inthiscase,firstweneedtoinitializeenvironmentvariablestobeabletorunVisualC++itself.SinceabatchfileforthispurposecamewithVisualC++,let’sexecuteitfirst.
C:\>cd"\ProgramFiles\MicrosoftVisualStudio.NET\Vc7\bin"C:\ProgramFiles\MicrosoftVisualStudio.NET\Vc7\bin>vcvars32
ThisisthecaseofVisualC++.NET.Ifitisversion6,itcanbefoundinthefollowingplace.
C:\ProgramFiles\MicrosoftVisualStudio\VC98\bin\
Afterexecutingvcvars32,allyouhavetodoistomovetothewin32\folderofthesourcetreeofrubyandbuild.Below,letussupposethesourcetreeisinC:\src.
C:\>cdsrc\rubyC:\src\ruby>cdwin32C:\src\ruby\win32>configureC:\src\ruby\win32>nmakeC:\src\ruby\win32>nmakeDESTDIR="C:\ProgramFiles\ruby"install
Then,rubycommandwouldbeinstalledinC:\ProgramFiles\ruby\bin\,andRubylibrarieswouldbeinC:\ProgramFiles\ruby\lib\.Becauserubydoesnotuseregistriesandsuchatall,youcanuninstallitbydeletingC:\ProgramFiles\rubyandbelow.
MinGWAsdescribedbefore,MinGWisonlyanenvironmenttocompile,thusthegeneralUNIXtoolslikesedorsharenotavailable.However,becausetheyarenecessarytobuildruby,youneedtoobtainitfromsomewhere.Forthis,therearealsotwomethods:CygwinandMSYS(MinimalSYStem).
However,Ican’trecommendMSYSbecausetroubleswerecontinuouslyhappenedatthebuildingcontestperformedbeforethepublicationofthisbook.Onthecontrary,inthewayofusingCygwin,itcanpassverystraightforwardly.Therefore,inthisbook,I’llexplainthewayofusingCygwin.
First,installMinGWandtheentiredevelopingtoolsbyusingsetup.exeofCygwin.BothCygwinandMinGWarealsoincludedintheattachedCD-ROM.\footnote{CygwinandMinGW……Seealsodoc/win.htmloftheattachedCD-ROM}Afterthat,allyouhavetodoistotypeasfollowsfrombashpromptofCygwin.
~/src/ruby%./configure--with-gcc='gcc-mno-cygwin'\--enable-sharedi386-mingw32~/src/ruby%make~/src/ruby%makeinstall
That’sit.Herethelineofconfigurespansmulti-linesbutinpracticewe’dwriteitononelineandthebackslashisnotnecessary.Theplacetoinstallis\usr\local\andbelowofthedriveonwhichitiscompiled.Becausereallycomplicatedthingsoccuraroundhere,theexplanationwouldbefairlylong,soI’llexplainitcomprehensivelyindoc/build.htmloftheattachedCD-ROM.
BuildingDetails
Untilhere,ithasbeentheREADME-likedescription.Thistime,let’slookatexactlywhatisdonebywhatwehavebeendone.However,thetalksherepartiallyrequireveryhigh-levelknowledge.Ifyoucan’tunderstand,I’dlikeyoutoskipthisanddirectlyjumptothenextsection.Thisshouldbewrittensothatyoucanunderstandbycomingbackafterreadingtheentirebook.
Now,onwhicheverplatform,buildingrubyisseparatedintothreephases.Namely,configure,makeandmakeinstall.Asconsideringtheexplanationaboutmakeinstallunnecessary,I’llexplaintheconfigurephaseandthemakephase.
configure
First,configure.Itscontentisashellscript,andwedetectthesystemparametersbyusingit.Forexample,“whetherthere’stheheaderfilesetjmp.h”or“whetheralloca()isavailable”,thesethingsarechecked.Thewaytocheckisunexpectedlysimple.
Targettocheck Method
commands executeitactuallyandthencheck$?headerfiles if[-f$includedir/stdio.h]
functions compileasmallprogramandcheckwhetherlinkingissuccess
Whensomedifferencesaredetected,somehowitshouldbereportedtous.Thewaytoreportis,thefirstwayisMakefile.IfweputaMakefile.ininwhichparametersareembeddedintheformof@param@,itwouldgenerateaMakefileinwhichtheyaresubstitutedwiththeactualvalues.Forexample,asfollows,
Makefile.in:CFLAGS=@CFLAGS@↓Makefile:CFLAGS=-g-O2
Alternatively,itwritesouttheinformationabout,forinstance,
whethertherearecertainfunctionsorparticularheaderfiles,intoaheaderfile.Becausetheoutputfilenamecanbechanged,itisdifferentdependingoneachprogram,butitisconfig.hinruby.I’dlikeyoutoconfirmthisfileiscreatedafterexecutingconfigure.Itscontentissomethinglikethis.
▼config.h
::#defineHAVE_SYS_STAT_H1#defineHAVE_STDLIB_H1#defineHAVE_STRING_H1#defineHAVE_MEMORY_H1#defineHAVE_STRINGS_H1#defineHAVE_INTTYPES_H1#defineHAVE_STDINT_H1#defineHAVE_UNISTD_H1#define_FILE_OFFSET_BITS64#defineHAVE_LONG_LONG1#defineHAVE_OFF_T1#defineSIZEOF_INT4#defineSIZEOF_SHORT2::
Eachmeaningiseasytounderstand.HAVE_xxxx_Hprobablyindicateswhetheracertainheaderfileexists,SIZEOF_SHORTmustindicatethesizeoftheshorttypeofC.Likewise,SIZEOF_INTindicatesthebytelengthofint,HAVE_OFF_Tindicateswhethertheoffset_ttypeisdefinedornot.
Aswecanunderstandfromtheabovethings,configuredoesdetectthedifferencesbutitdoesnotautomaticallyabsorbthedifferences.
Bridgingthedifferenceislefttoeachprogrammer.Forexample,asfollows,
▼AtypicalusageoftheHAVE_macro
24#ifdefHAVE_STDLIB_H25#include<stdlib.h>26#endif
(ruby.h)
autoconf
configureisnotaruby-specifictool.Whethertherearefunctions,thereareheaderfiles,…itisobviousthatthesetestshaveregularity.Itiswastefulifeachpersonwhowritesaprogramwroteeachowndistincttool.
Hereatoolnamedautoconfcomesin.Inthefilesnamedconfigure.inorconfigure.ac,writeabout“I’dliketodothesechecks”,processitwithautoconf,thenanadequateconfigurewouldbegenerated.The.inofconfigure.inisprobablyanabbreviationofinput.It’sthesameastherelationshipbetweenMakefileandMakefile.in..acis,ofcourse,anabbreviationofAutoConf.
Toillustratethistalkupuntilhere,itwouldbelikeFigure1.
Figure1:TheprocessuntilMakefileiscreated
Forthereaderswhowanttoknowmoredetails,Irecommend“GNUAutoconf/Automake/Libtool”GaryV.Vaughan,BenElliston,TomTromey,IanLanceTaylor.
Bytheway,ruby‘sconfigureis,assaidbefore,generatedbyusingautoconf,butnotalltheconfigureinthisworldaregeneratedwithautoconf.Itcanbewrittenbyhandoranothertooltoautomaticallygeneratecanbeused.Anyway,it’ssufficientifultimatelythereareMakefileandconfig.handmanyothers.
make
Atthesecondphase,make,whatisdone?Ofcourse,itwouldcompilethesourcecodeofruby,butwhenlookingattheoutputofmake,Ifeelliketherearemanyotherthingsitdoes.I’llbrieflyexplaintheprocessofit.
1. compilethesourcecodecomposingrubyitself2. createthestaticlibrarylibruby.agatheringthecrucialpartsof
ruby
3. create“miniruby”,whichisanalwaysstatically-linkedruby
4. createthesharedlibrarylibruby.sowhen--enable-shared5. compiletheextensionlibraries(underext/)byusingminiurby6. Atlast,generatetherealruby
Therearetworeasonswhyitcreatesminirubyandrubyseparately.Thefirstoneisthatcompilingtheextensionlibrariesrequiresruby.Inthecasewhen--enable-shared,rubyitselfisdynamicallylinked,thusthere’sapossibilitynotbeabletoruninstantlybecauseoftheloadpathsofthelibraries.Therefore,createminiruby,whichisstaticallylinked,anduseitduringthebuildingprocess.
Thesecondreasonis,inaplatformwherewecannotusesharedlibraries,there’sacasewhentheextensionlibrariesarestaticallylinkedtorubyitself.Inthiscase,itcannotcreaterubybeforecompilingallextensionlibraries,buttheextensionlibrariescannotbecompiledwithoutruby.Inordertoresolvethisdilemma,itusesminiruby.
CVS
TherubyarchiveincludedintheattachedCD-ROMis,asthesameastheofficialreleasepackage,justasnapshotwhichisanappearanceatjustaparticularmomentofruby,whichisacontinuouslychangingprogram.Howrubyhasbeenchanged,whyithasbeenso,thesethingsarenotdescribedthere.Thenwhatis
thewaytoseetheentirepictureincludingthepast.WecandoitbyusingCVS.
AboutCVSCVSisshortlyanundolistofeditors.IfthesourcecodeisunderthemanagementofCVS,thepastappearancecanberestoredanytime,andwecanunderstandwhoandwhereandwhenandhowchangeditimmediatelyanytime.GenerallyaprogramdoingsuchjobiscalledsourcecodemanagementsystemandCVSisthemostfamousopen-sourcesourcecodemanagementsysteminthisworld.
SincerubyisalsomanagedwithCVS,I’llexplainalittleaboutthemechanismandusageofCVS.First,themostimportantideaofCVSisrepositoryandworking-copy.IsaidCVSissomethinglikeanundolistofeditor,inordertoarchivethis,therecordsofeverychanginghistoryshouldbesavedsomewhere.Theplacetostoreallofthemis“CVSrepository”.
Directlyspeaking,repositoryiswhatgathersallthepastsourcecodes.Ofcourse,thisisonlyaconcept,inreality,inordertosavespaces,itisstoredintheformofonerecentappearanceandthechangingdifferences(namely,batches).Inanyways,itissufficientifwecanobtaintheappearanceofaparticularfileofaparticularmomentanytime.
Ontheotherhand,“workingcopy”istheresultoftakingfilesfromtherepositorybychoosingacertainpoint.There’sonlyone
repository,butyoucanhavemultipleworkingcopies.(Figure2)
Figure2:Repositoryandworkingcopies
Whenyou’dliketomodifythesourcecode,firsttakeaworkingcopy,edititbyusingeditorandsuch,and“return”it.Then,thechangeisrecordedtotherepository.Takingaworkingcopyfromtherepositoryiscalled“checkout”,returningiscalled“checkin”or“commit”(Figure3).Bycheckingin,thechangeisrecordedtotherepository,thenwecanobtainitanytime.
Figure3:CheckinandCheckout
ThebiggesttraitofCVSiswecanaccessitoverthenetworks.Itmeans,ifthere’sonlyoneserverwhichholdstherepository,everyonecancheckin/checkoutovertheinternetanytime.Butgenerallytheaccesstocheckinisrestrictedandwecan’tdoitfreely.
RevisionHowcanwedotoobtainacertainversionfromtherepository?Onewayistospecifywithtime.Byrequiring“givemetheedgeversionofthattime”,itwouldselectit.Butinpractice,werarelyspecifywithtime.Mostcommonly,weusesomethingnamed“revision”.
“Revision”and“Version”havethealmostsamemeaning.Butusually“version”isattachedtotheprojectitself,thususingtheword“version”canbeconfusing.Therefore,theword“revision”isusedtoindicateabitsmallerunit.
InCVS,thefilejuststoredintherepositoryisrevision1.1.Checkingoutit,modifyingit,checkinginit,thenitwouldberevision1.2.Nextitwouldbe1.3then1.4.
AsimpleusageexampleofCVSKeepinginmindtheabovethings,I’lltalkabouttheusageofCVSveryverybriefly.First,cvscommandisessential,soI’dlikeyoutoinstallitbeforehand.ThesourcecodeofcvsisincludedintheattachedCD-ROM\footnote{cvs:archives/cvs-1.11.2.tar.gz}.Howtoinstallcvsisreallyfarfromthemainline,thusitwon’tbeexplainedhere.
Afterinstallingit,let’scheckoutthesourcecodeofrubyasanexperiment.Typethefollowingcommandswhenyouareonline.
%cvs-d:pserver:[email protected]:/srcloginCVSPassword:anonymous%cvs-d:pserver:[email protected]:/srccheckoutruby
Anyoptionswerenotspecified,thustheedgeversionwouldbeautomaticallycheckedout.Thetrulyedgeversionofrubymustappearunderruby/.
Additionally,ifyou’dliketoobtaintheversionofacertainday,youcanuse-Doptionofcvscheckout.Bytypingasfollows,youcanobtainaworkingcopyoftheversionwhichisbeingexplainedbythisbook.
%cvs-d:pserver:[email protected]:/srccheckout-D2002-09-12ruby
Atthismoment,youhavetowriteoptionsimmediatelyaftercheckout.Ifyouwrote“ruby”first,itwouldcauseastrangeerrorcomplaining“missingamodule”.
And,withtheanonymousaccesslikethisexample,wecannotcheckin.Inordertopracticecheckingin,it’sgoodtocreatea(local)repositoryandstorea“Hello,World!”programinit.Theconcretewaytostoreisnotexplainedhere.Themanualcomingwithcvsisfairlyfriendly.RegardingbookswhichyoucanreadinJapanese,Irecommendtranslated“OpenSourceDevelopmentwithCVS”KarlFogel,MosheBar.
Thecompositionofruby
ThephysicalstructureNowitistimetostarttoreadthesourcecode,butwhatisthethingweshoulddofirst?Itislookingoverthedirectorystructure.Inmostcases,thedirectorystructure,meaningthesourcetree,directlyindicatethemodulestructureoftheprogram.Abruptlysearchingmain()byusinggrepandreadingfromthetopinitsprocessingorderisnotsmart.Ofcoursefindingoutmain()isalsoimportant,butfirstlet’staketimetodolsorheadtograspthewholepicture.
BelowistheappearanceofthetopdirectoryimmediatelyaftercheckingoutfromtheCVSrepository.Whatendwithaslasharesubdirectories.
COPYINGcompar.cgc.cnumeric.csample/COPYING.jaconfig.guesshash.cobject.csignal.cCVS/config.subinits.cpack.csprintf.cChangeLogconfigure.ininstall-shparse.yst.cGPLcygwin/instruby.rbprec.cst.hLEGALdefines.hintern.hprocess.cstring.cLGPLdir.cio.crandom.cstruct.cMANIFESTdjgpp/keywordsrange.ctime.cMakefile.indln.clex.cre.cutil.cREADMEdln.hlib/re.hutil.hREADME.EXTdmyext.cmain.cregex.cvariable.cREADME.EXT.jadoc/marshal.cregex.hversion.cREADME.jaenum.cmath.cruby.1version.hToDoenv.hmisc/ruby.cvms/array.cerror.cmissing/ruby.hwin32/bcc32/eval.cmissing.hrubyio.hx68/
bignum.cext/mkconfig.rbrubysig.hclass.cfile.cnode.hrubytest.rb
Recentlythesizeofaprogramitselfhasbecomelarger,andtherearemanysoftwareswhosesubdirectoriesaredividedintopieces,butrubyhasbeenconsistentlyusedthetopdirectoryforalongtime.Itbecomesproblematiciftherearetoomanyfiles,butwecangetusedtothisamount.
Thefilesatthetoplevelcanbecategorizedintosix:
documentsthesourcecodeofrubyitselfthetooltobuildrubystandardextensionlibrariesstandardRubylibrariestheothers
Thesourcecodeandthebuildtoolareobviouslyimportant.Asidefromthem,I’lllistupwhatseemsusefulforus.
ChangeLog
Therecordsofchangesonruby.Thisisveryimportantwheninvestigatingthereasonofacertainchange.
README.EXTREADME.EXT.ja
Howtocreateanextensionlibraryisdescribed,butinthecourseofit,thingsrelatingtotheimplementationofrubyitselfarealso
written.
DissectingSourceCodeFromnowon,I’llfurthersplitthesourcecodeofrubyitselfintomoretinypieces.Asforthemainfiles,itscategorizationisdescribedinREADME.EXT,thusI’llfollowit.Regardingwhatisnotdescribed,Icategorizeditbymyself.
RubyLanguageCoreclass.c classrelatingAPIerror.c exceptionrelatingAPIeval.c evaluatorgc.c garbagecollectorlex.c reservedwordtableobject.c objectsystemparse.y parservariable.c constants,globalvariables,classvariablesruby.h Themainmacrosandprototypesofruby
intern.htheprototypesofCAPIofruby.internseemstobeanabbreviationofinternal,butthefunctionswrittenherecanbeusedfromextensionlibraries.
rubysig.h theheaderfilecontainingthemacrosrelatingtosignalsnode.h thedefinitionsrelatingtothesyntaxtreenodes
env.h thedefinitionsofthestructstoexpressthecontextoftheevaluator
Thepartstocomposethecoreoftherubyinterpretor.Themostofthefileswhichwillbeexplainedinthisbookarecontainedhere.If
youconsiderthenumberofthefilesoftheentireruby,itisreallyonlyafew.Butifyouthinkbasedonthebytesize,50%oftheentireamountisoccupiedbythesefiles.Especially,eval.cis200KB,parse.yis100KB,thesefilesarelarge.
Utilitydln.c dynamicloaderregex.c regularexpressionenginest.c hashtableutil.c librariesforradixconversionsandsortandsoon
Itmeansutilityforruby.However,someofthemaresolargethatyoucannotimagineitfromtheword“utility”.Forinstance,regex.cis120KB.
Implementationofrubycommanddmyext.c dummyoftheroutinetoinitializeextensionlibraries(
DumMYEXTension)
inits.c theentrypointforcoreandtheroutinetoinitializeextensionlibraries
main.c theentrypointofrubycommand(thisisunnecessaryforlibruby)
ruby.c themainpartofrubycommand(thisisalsonecessaryforlibruby)
version.c theversionofruby
Theimplementationofrubycommand,whichisofwhentypingrubyonthecommandlineandexecuteit.Thisisthepart,forinstance,tointerpretthecommandlineoptions.Asidefromruby
command,asthecommandsutilizingrubycore,therearemod_rubyandvim.Thesecommandsarefunctioningbylinkingtothelibrubylibrary(.a/.so/.dllandsoon).
ClassLibrariesarray.c classArraybignum.c classBignumcompar.c moduleComparabledir.c classDirenum.c moduleEnumerablefile.c classFilehash.c classHash(Itsactualbodyisst.c)io.c classIOmarshal.c moduleMarshalmath.c moduleMathnumeric.c classNumeric,Integer,Fixnum,Floatpack.c Array#pack,String#unpackprec.c modulePrecisionprocess.c moduleProcessrandom.c Kernel#srand(),rand()range.c classRangere.c classRegexp(Itsactualbodyisregex.c)signal.c moduleSignalsprintf.c ruby-specificsprintf()string.c classStringstruct.c classStructtime.c classTime
TheimplementationsoftheRubyclasslibraries.WhatlistedherearebasicallyimplementedinthecompletelysamewayastheordinaryRubyextensionlibraries.Itmeansthattheselibrariesare
alsoexamplesofhowtowriteanextensionlibrary.
Filesdependingonaparticularplatformbcc32/ BorlandC++(Win32)beos/ BeOScygwin/ Cygwin(theUNIXsimulationlayeronWin32)djgpp/ djgpp(thefreedevelopingenvironmentforDOS)vms/ VMS(anOShadbeenreleasedbyDECbefore)win32/ VisualC++(Win32)x68/ SharpX680x0series(OSisHuman68k)
Eachplatform-specificcodeisstored.
fallbackfunctionsmissing/
Filestooffsetthefunctionswhicharemissingoneachplatform.Mainlyfunctionsoflibc.
LogicalStructureNow,therearetheabovefourgroupsandthecorecanbedividedfurtherintothree:First,“objectspace”whichcreatestheobjectworldofRuby.Second,“parser”whichconvertsRubyprograms(intext)totheinternalformat.Third,“evaluator”whichdrivesRubyprograms.Bothparserandevaluatorarecomposedaboveobjectspace,parserconvertsaprogramintotheinternalformat,andevaluatoractuatestheprogram.Letmeexplaintheminorder.
ObjectSpaceThefirstoneisobjectspace.Thisisveryeasytounderstand.Itisbecauseallofwhatdealtwithbythisarebasicallyonthememory,thuswecandirectlyshowormanipulatethembyusingfunctions.Therefore,inthisbook,theexplanationwillstartwiththispart.Part1isfromchapter2tochapter7.
ParserThesecondoneisparser.Probablysomepreliminaryexplanationsarenecessaryforthis.
rubycommandistheinterpretorofRubylanguage.Itmeansthatitanalyzestheinputwhichisatextoninvocationandexecutesitbyfollowingit.Therefore,rubyneedstobeabletointerpretthemeaningoftheprogramwrittenasatext,butunfortunatelytextisveryhardtounderstandforcomputers.Forcomputers,textfilesaremerelybytesequencesandnothingmorethanthat.Inordertocomprehendthemeaningoftextfromit,somespecialgimmickisnecessary.Andthegimmickisparser.Bypassingthroughparser,(atextas)aRubyprogramwouldbeconvertedintotheruby-specificinternalexpressionwhichcanbeeasilyhandledfromtheprogram.
Theinternalexpressioniscalled“syntaxtree”.Syntaxtreeexpressesaprogrambyatreestructure,forinstance,figure4showshowanifstatementisexpressed.
Figure4:anifstatementanditscorrespondingsyntaxtree
ParserwillbedescribedinPart2“SyntacticAnalysis”.Part2isfromchapter10tochapter12.Itstargetfileisonlyparse.y.
EvaluatorObjectsareeasytounderstandbecausetheyaretangible.Alsoregardingparser,Whatitdoesisultimatelyconvertingadataformatintoanotherone,soit’sreasonablyeasytounderstand.However,thethirdone,evaluator,thisiscompletelyelusive.
Whatevaluatordoesis“executing”aprogrambyfollowingasyntaxtree.Thissoundseasy,butwhatis“executing”?Toanswerthisquestionpreciselyisfairlydifficult.Whatis“executinganifstatement”?Whatis“executingawhilestatement”?Whatdoes“assigningtoalocalvariable”mean?Wecannotunderstandevaluatorwithoutansweringallofsuchquestionsclearlyand
precisely.
Inthisbook,evaluatorwillbediscussedinPart3“Evaluate”.Itstargetfileiseval.c.evalisanabbreviationof“evaluator”.
Now,I’vedescribedbrieflyaboutthestructureofruby,howevereventhoughtheideaswereexplained,itdoesnotsomuchhelpusunderstandthebehaviorofprogram.Inthenextchapter,we’llstartwithactuallyusingruby.
TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera
CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License
RubyHackingGuide
TranslatedbySebastianKrause
Chapter1:Introduction
AMinimalIntroductiontoRuby
HeretheRubyprerequisitesareexplained,whichoneneedstoknowinordertounderstandthefirstsection.Iwon’tpointoutprogrammingtechniquesorpointsoneshouldbecarefulabout.Sodon’tthinkyou’llbeabletowriteRubyprogramsjustbecauseyoureadthischapter.ReaderswhohavepriorexperiencewithRubycanskipthischapter.
Wewilltalkaboutgrammarextensivelyinthesecondsection,henceIwon’tdelveintothefinerpointsofgrammarhere.FromhashliteralsandsuchI’llshowonlythemostwidelyusednotations.OnprincipleIwon’tomitthingsevenifIcan.Thiswaythesyntaxbecomesmoresimple.Iwon’talwayssay“Wecanomitthis”.
Objects
Strings
EverythingthatcanbemanipulatedinaRubyprogramisanobject.TherearenoprimitivesasJava’sintandlong.Forinstanceifwewriteasbelowitdenotesastringobjectwithcontentcontent.
"content"
Icasuallycalleditastringobjectbuttobeprecisethisisanexpressionwhichgeneratesastringobject.Thereforeifwewriteitseveraltimeseachtimeanotherstringobjectisgenerated.
"content""content""content"
Herethreestringobjectswithcontentcontentaregenerated.
Bytheway,objectsjustexistingtherecan’tbeseenbyprogrammers.Let’sshowhowtoprintthemontheterminal.
p("content")#Shows"content"
Everythingafteran#isacomment.Fromnowon,I’llputtheresultofanexpressioninacommentbehind.
p(……)callsthefunctionp.Itdisplaysarbitraryobjects“assuch”.It’sbasicallyadebuggingfunction.
Preciselyspeaking,therearenofunctionsinRuby,butjustfornowwecanthinkofitasafunction.Youcanusefunctionswhereveryouare.
VariousLiteralsNow,let’sexplainsomemoretheexpressionswhichdirectlygenerateobjects,theso-calledliterals.Firsttheintegersandfloatingpointnumbers.
#Integer121009999999999999999999999999#Arbitrarilybigintegers
#Float1.099.9991.3e4#1.3×10^4
Don’tforgetthattheseareallexpressionswhichgenerateobjects.I’mrepeatingmyselfbuttherearenoprimitivesinRuby.
Belowanarrayobjectisgenerated.
[1,2,3]
Thisprogramgeneratesanarraywhichconsistsofthethreeintegers1,2and3inthatorder.Astheelementsofanarraycanbearbitraryobjectsthefollowingisalsopossible.
[1,"string",2,["nested","array"]]
Andfinally,ahashtableisgeneratedbytheexpressionbelow.
{"key"=>"value","key2"=>"value2","key3"=>"value3"}
Ahashtableisastructurewhichexpressesone-to-onerelationshipsbetweenarbitraryobjects.Theabovelinecreatesatablewhichstoresthefollowingrelationships.
"key"→"value""key2"→"value2""key3"→"value3"
Ifweaskahashtablecreatedinthisway“What’scorrespondingtokey?”,it’llanswer“That’svalue.”Howcanweask?Weusemethods.
MethodCallsWecancallmethodsonanobject.InC++Jargontheyarememberfunctions.Idon’tthinkit’snecessarytoexplainwhatamethodis.I’lljustexplainthenotation.
"content".upcase()
Heretheupcasemethodiscalledonastringobject(withcontentcontent).Asupcaseisamethodwhichreturnsanewstringwiththesmalllettersreplacedbycapitalletters,wegetthefollowingresult.
p("content".upcase())#Shows"CONTENT"
Methodcallscanbechained.
"content".upcase().downcase()
Herethemethoddowncaseiscalledonthereturnvalueof"content".upcase().
Therearenopublicfields(membervariables)asinJavaorC++.Theobjectinterfaceconsistsofmethodsonly.
TheProgram
TopLevelInRubywecanjustwriteexpressionsanditbecomesaprogram.Onedoesn’tneedtodefineamain()asinC++orJava.
p("content")
ThisisacompleteRubyprogram.Ifweputthisintoafilecalledfirst.rbwecanexecuteitfromthecommandlineasfollows.
%rubyfirst.rb"content"
Withthe-eoptionoftherubyprogramwedon’tevenneedtocreateafile.
%ruby-e'p("content")'"content"
Bytheway,theplacewherepiswrittenisthelowestnestingleveloftheprogram,itmeansthehighestlevelfromtheprogram’sstandpoint,thusit’scalled“top-level”.Havingtop-levelisacharacteristictraitofRubyasascriptinglanguage.
InRuby,onelineisusuallyonestatement.Asemicolonattheendisn’tnecessary.Thereforetheprogrambelowisinterpretedasthreestatements.
p("content")p("content".upcase())p("CONTENT".downcase())
Whenweexecuteititlookslikethis.
%rubysecond.rb"content""CONTENT""content"
LocalVariablesInRubyallvariablesandconstantsstorereferencestoobjects.That’swhyonecan’tcopythecontentbyassigningonevariabletoanothervariable.VariablesoftypeObjectinJavaorpointerstoobjectsinC++aregoodtothinkof.However,youcan’tchangethevalueofeachpointeritself.
InRubyonecantelltheclassification(scope)ofavariablebythebeginningofthename.Localvariablesstartwithasmallletteror
anunderscore.Onecanwriteassignmentsbyusing“=”.
str="content"arr=[1,2,3]
Aninitialassignmentservesasdeclaration,anexplicitdeclarationisnotnecessary.Becausevariablesdon’thavetypes,wecanassignanykindofobjectsindiscriminately.Theprogrambelowiscompletelylegal.
lvar="content"lvar=[1,2,3]lvar=1
Butevenifwecan,wedon’thavetodoit.Ifdifferentkindofobjectsareputinonevariable,ittendstobecomedifficulttoread.InarealworldRubyprogramonedoesn’tdothiskindofthingswithoutagoodreason.Theabovewasjustanexampleforthesakeofit.
Variablereferencehasalsoaprettysensiblenotation.
str="content"p(str)#Shows"content"
Inadditionlet’scheckthepointthatavariableholdareferencebytakinganexample.
a="content"b=ac=b
Afterweexecutethisprogramallthreelocalvariablesabcpointtothesameobject,astringobjectwithcontent"content"createdonthefirstline(Figure1).
Figure1:Rubyvariablesstorereferencestoobjects
Bytheway,asthesevariablesarecalledlocal,theyshouldbelocaltosomewhere,butwecannottalkaboutthisscopewithoutreadingabitfurther.Let’ssayfornowthatthetoplevelisonelocalscope.
ConstantsConstantsstartwithacapitalletter.Theycanonlybeassignedonce(attheircreation).
Const="content"PI=3.1415926535
p(Const)#Shows"content"
I’dliketosaythatifweassigntwiceanerroroccurs.Butthereisjustawarning,notanerror.ItisinthiswayinordertoavoidraisinganerrorevenwhenthesamefileisloadedtwiceinapplicationsthatmanipulateRubyprogramitself,forinstanceindevelopmentenvironments.Therefore,itisallowedduetopracticalrequirementsandthere’snootherchoice,butessentiallythereshouldbeanerror.Infact,upuntilversion1.1therereallywasanerror.
C=1C=2#Thereisawarningbutideallythereshouldbeanerror.
Alotofpeoplearefooledbythewordconstant.Aconstantonlydoesnotswitchobjectsonceitisassigned.Butitdoesnotmeanthepointedobjectitselfwon’tchange.Theterm“readonly”mightcapturetheconceptbetterthan“constant”.
Bytheway,toindicatethatanobjectitselfshouldn’tbechangedanothermeansisused:freeze.
Figure2:constantmeansreadonly
Andthescopeofconstantsisactuallyalsocannotbedescribedyet.Itwillbediscussedlaterinthenextsectionmixingwithclasses.
ControlStructuresSinceRubyhasawideabundanceofcontrolstructures,justliningupthemcanbeahugetask.Fornow,Ijustmentionthatthereareifandwhile.
ifi<10then#bodyend
whilei<10do#bodyend
Inaconditionalexpression,onlythetwoobjects,falseandnil,arefalseandallothervariousobjectsaretrue.0ortheemptystringarealsotrueofcourse.
Itwouldn’tbewiseiftherewerejustfalse,thereisalsotrue.Anditisofcoursetrue.
ClassesandMethods
ClassesInobjectorientedsystem,essentiallymethodsbelongtoobjects.Itcanholdonlyinaidealworld,though.Inanormalprogramtherearealotofobjectswhichhavethesamesetofmethods,itwouldbe
anenormousworkifeachobjectrememberthesetofcallablemethods.Usuallyamechanismlikeclassesormultimethodsisusedtogetridoftheduplicationofdefinitions.
InRuby,asthetraditionalwaytobindobjectsandmethodstogether,theconceptofclassesisused.Namelyeveryobjectbelongstoaclass,themethodswhichcanbecalledaredeterminedbytheclass.Andinthisway,anobjectiscalled“aninstanceoftheXXclass”.
Forexamplethestring"str"isaninstanceoftheStringclass.AndonthisStringclassthemethodsupcase,downcase,stripandmanyothersaredefined.Soitlooksasifeachstringobjectcanrespondtoallthesemethods.
#TheyallbelongtotheStringclass,#hencethesamemethodsaredefined"content".upcase()"Thisisapen.".upcase()"chapterII".upcase()
"content".length()"Thisisapen.".length()"chapterII".length()
Bytheway,whathappensifthecalledmethodisn’tdefined?InastaticlanguageacompilererroroccursbutinRubythereisaruntimeexception.Let’stryitout.Forthiskindofprogramsthe-eoptionishandy.
%ruby-e'"str".bad_method()'-e:1:undefinedmethod`bad_method'for"str":String(NoMethodError)
Whenthemethodisn’tfoundthere’sapparentlyaNoMethodError.
Alwayssaying“theupcasemethodofString”andsuchiscumbersome.Let’sintroduceaspecialnotationString#upcasereferstothemethodupcasedefinedintheclassString.
Bytheway,ifwewriteString.upcaseithasacompletelydifferentmeaningintheRubyworld.Whatcouldthatbe?Iexplainitinthenextparagraph.
ClassDefinitionUptonowwetalkedaboutalreadydefinedclasses.Wecanofcoursealsodefineourownclasses.Todefineclassesweusetheclassstatement.
classCend
ThisisthedefinitionofanewclassC.Afterwedefineditwecanuseitasfollows.
classCendc=C.new()#createaninstanceofCandassignittothevariablec
NotethatthenotationforcreatinganewinstanceisnotnewC.Theastutereadermightthink:Hmm,thisC.new()reallylookslikeamethodcall.InRubytheobjectgeneratingexpressionsareindeed
justmethods.
InRubyclassnamesandconstantnamesarethesame.Then,whatisstoredintheconstantwhosenameisthesameasaclassname?Infact,it’stheclass.InRubyallthingswhichaprogramcanmanipulateareobjects.Soofcourseclassesarealsoexpressedasobjects.Let’scalltheseclassobjects.EveryclassisaninstanceoftheclassClass.
Inotherwordsaclassstatementcreatesanewclassobjectanditassignsaconstantnamedwiththeclassnametotheclass.Ontheotherhandthegenerationofaninstancereferencesthisconstantandcallsamethodonthisobject(usuallynew).Ifwelookattheexamplebelow,it’sprettyobviousthatthecreationofaninstancedoesn’tdifferfromanormalmethodcall.
S="content"classCend
S.upcase()#GettheobjecttheconstantSpointstoandcallupcaseC.new()#GettheobjecttheconstantCpointstoandcallnew
SonewisnotareservedwordinRuby.
Andwecanalsousepforaninstanceofaclassevenimmediatelyafteritscreation.
classCend
c=C.new()
p(c)##<C:0x2acbd7e4>
Itwon’tdisplayasnicelyasastringoranintegerbutitshowsitsrespectiveclassandit’sinternalID.ThisIDisthepointervaluewhichpointstotheobject.
Oh,Icompletelyforgottomentionaboutthenotationofmethodnames:Object.newmeanstheclassobjectObjectandthenewmethodcalledontheclassitself.SoObject#newandObject.newarecompletelydifferentthings,wehavetoseparatethemstrictly.
obj=Object.new()#Object.newobj.new()#Object#new
InpracticeamethodObject#newisalmostneverdefinedsothesecondlinewillreturnanerror.Pleaseregardthisasanexampleofthenotation.
MethodDefinitionEvenifwecandefineclasses,itisuselessifwecannotdefinemethods.Let’sdefineamethodforourclassC.
classCdefmyupcase(str)returnstr.upcase()endend
Todefineamethodweusethedefstatement.Inthisexamplewe
definedthemethodmyupcase.Thenameoftheonlyparameterisstr.Aswithvariables,it’snotnecessarytowriteparametertypesorthereturntype.Andwecanuseanynumberofparameters.
Let’susethedefinedmethod.Methodsareusuallycalledfromtheoutsidebydefault.
c=C.new()result=c.myupcase("content")p(result)#Shows"CONTENT"
Ofcourseifyougetusedtoityoudon’tneedtoassigneverytime.Thelinebelowgivesthesameresult.
p(C.new().myupcase("content"))#Alsoshows"CONTENT"
self
Duringtheexecutionofamethodtheinformationaboutwhoisitself(theinstanceonwhichthemethodwascalled)isalwayssavedandcanbepickedupinself.LikethethisinC++orJava.Let’scheckthisout.
classCdefget_self()returnselfendend
c=C.new()p(c)##<C:0x40274e44>p(c.get_self())##<C:0x40274e44>
Aswesee,theabovetwoexpressionsreturntheexactsameobject.Wecouldconfirmthatselfiscduringthemethodcallonc.
Thenwhatisthewaytocallamethodonitself?Whatfirstcomestomindiscallingviaself.
classCdefmy_p(obj)self.real_my_p(obj)#calledamethodagainstoneselfend
defreal_my_p(obj)p(obj)endend
C.new().my_p(1)#Output1
Butalwaysaddingtheselfwhencallinganownmethodistedious.Hence,itisdesignedsothatonecanomitthecalledmethod(thereceiver)wheneveronecallsamethodonself.
classCdefmy_p(obj)real_my_p(obj)#Youcancallwithoutspecifyingthereceiverend
defreal_my_p(obj)p(obj)endend
C.new().my_p(1)#Output1
InstanceVariables
Asthereareasaying“Objectsaredataandcode”,justbeingabletodefinemethodsalonewouldbenotsouseful.Eachobjectmustalsobeabletotostoredata.Inotherwordsinstancevariables.OrinC++jargonmembervariables.
InthefashionofRuby’svariablenamingconvention,thevariabletypecanbedeterminedbythefirstafewcharacters.Forinstancevariablesit’san@.
classCdefset_i(value)@i=valueend
defget_i()return@iendend
c=C.new()c.set_i("ok")p(c.get_i())#Shows"ok"
Instancevariablesdifferabitfromthevariablesseenbefore:Wecanreferencethemwithoutassigning(defining)them.Toseewhathappensweaddthefollowinglinestothecodeabove.
c=C.new()p(c.get_i())#Showsnil
Callinggetwithoutsetgivesnil.nilistheobjectwhichindicates“nothing”.It’smysteriousthatthere’sreallyanobjectbutitmeansnothing,butthat’sjustthewayitis.
Wecanusenillikealiteralaswell.
p(nil)#Showsnil
initialize
Aswesawbefore,whenwecall‘new’onafreshlydefinedclass,wecancreateaninstance.That’ssure,butsometimeswemightwanttohaveapeculiarinstantiation.Inthiscasewedon’tchangethenewmethod,wedefinetheinitializemethod.Whenwedothis,itgetscalledwithinnew.
classCdefinitialize()@i="ok"enddefget_i()return@iendendc=C.new()p(c.get_i())#Shows"ok"
Strictlyspeakingthisisthespecificationofthenewmethodbutnotthespecificationofthelanguageitself.
InheritanceClassescaninheritfromotherclasses.ForinstanceStringinheritsfromObject.Inthisbook,we’llindicatethisrelationbyaverticalarrowasinFig.3.
Figure3:Inheritance
Inthecaseofthisillustration,theinheritedclass(Object)iscalledsuperclassorsuperiorclass.Theinheritingclass(String)iscalledsubclassorinferiorclass.ThispointdiffersfromC++jargon,becareful.Butit’sthesameasinJava.
Anywaylet’stryitout.Letourcreatedclassinheritfromanotherclass.Toinheritfromanotherclass(ordesignateasuperclass)writethefollowing.
classC<SuperClassNameend
WhenweleaveoutthesuperclasslikeinthecasesbeforetheclassObjectbecomestacitlythesuperclass.
Now,whyshouldwewanttoinherit?Ofcoursetohandovermethods.Handingovermeansthatthemethodswhichweredefinedinthesuperclassalsoworkinthesubclassasiftheyweredefinedinthereoncemore.Let’scheckitout.
classCdefhello()return"hello"endend
classSub<Cend
sub=Sub.new()p(sub.hello())#Shows"hello"
hellowasdefinedintheclassCbutwecouldcallitonaninstanceoftheclassSubaswell.Ofcoursewedon’tneedtoassignvariables.Theaboveisthesameasthelinebelow.
p(Sub.new().hello())
Bydefiningamethodwiththesamename,wecanoverwritethemethod.InC++andObjectPascal(Delphi)it’sonlypossibletooverwritefunctionsexplicitlydefinedwiththekeywordvirtualbutinRubyeverymethodcanbeoverwrittenunconditionally.
classCdefhello()return"Hello"endend
classSub<Cdefhello()return"HellofromSub"endend
p(Sub.new().hello())#Shows"HellofromSub"p(C.new().hello())#Shows"Hello"
Wecaninheritoverseveralsteps.ForinstanceasinFig.4FixnuminheritseverymethodfromObject,NumericandInteger.Whenthere
aremethodswiththesamenamethenearerclassestakepreference.Astypeoverloadingisn’tthereatalltherequisitesareextremelystraightforward.
Figure4:Inheritanceovermultiplesteps
InC++it’spossibletocreateaclasswhichinheritsnothing.WhileinRubyonehastoinheritfromtheObjectclasseitherdirectlyorindirectly.InotherwordswhenwedrawtheinheritancerelationsitbecomesasingletreewithObjectatthetop.Forexample,whenwedrawatreeoftheinheritancerelationsamongtheimportantclassesofthebasiclibrary,itwouldlooklikeFig.5.
Figure5:Ruby’sclasstree
Oncethesuperclassisappointed(inthedefinitionstatement)it’simpossibletochangeit.Inotherwords,onecanaddanewclasstotheclasstreebutcannotchangeapositionordeleteaclass.
InheritanceofVariables……?InRuby(instance)variablesaren’tinherited.Eventhoughtryingtoinherit,aclassdoesnotknowaboutwhatvariablesaregoingtobeused.
Butwhenaninheritedmethodiscalled(inaninstanceofasubclass),assignmentofinstancevariableshappens.Whichmeanstheybecomedefined.Then,sincethenamespaceofinstancevariablesiscompletelyflatbasedoneachinstance,itcanbe
accessedbyamethodofwhicheverclass.
classAdefinitialize()#calledfromwhenprocessingnew()@i="ok"endend
classB<Adefprint_i()p(@i)endend
B.new().print_i()#Shows"ok"
Ifyoucan’tagreewiththisbehavior,let’sforgetaboutclassesandinheritance.Whenthere’saninstanceobjoftheclassC,thenthinkasifallthemethodsofthesuperclassofCaredefinedinC.Ofcoursewekeeptheoverwriteruleinmind.ThenthemethodsofCgetattachedtotheinstanceobj(Fig.6).ThisstrongpalpabilityisaspecialtyofRuby’sobjectorientation.
Figure6:AconceptionofaRubyobject
ModulesOnlyasinglesuperclasscanbedesignated.SoRubylookslikesingleinheritance.Butbecauseofmodulesithasinpracticetheabilitywhichisidenticaltomultipleinheritance.Let’sexplainthesemodulesnext.
Inshort,modulesareclassesforwhichasuperclasscannotbedesignatedandinstancescannotbecreated.Forthedefinitionwewriteasfollows.
moduleMend
HerethemoduleMwasdefined.Methodsaredefinedexactlythesamewayasforclasses.
moduleMdefmyupcase(str)returnstr.upcase()endend
Butbecausewecannotcreateinstances,wecannotcallthemdirectly.Todothat,weusethemoduleby“including”itintootherclasses.Thenwebecometobeabletodealwithitasifaclassinheritedthemodule.
moduleMdefmyupcase(str)returnstr.upcase()end
end
classCincludeMend
p(C.new().myupcase("content"))#"CONTENT"isshown
EventhoughnomethodwasdefinedintheclassCwecancallthemethodmyupcase.Itmeansit“inherited”themethodofthemoduleM.Inclusionisfunctionallycompletelythesameasinheritance.There’snolimitondefiningmethodsoraccessinginstancevariables.
Isaidwecannotspecifyanysuperclassofamodule,butothermodulescanbeincluded.
moduleMend
moduleM2includeMend
Inotherwordsit’sfunctionallythesameasappointingasuperclass.Butaclasscannotcomeaboveamodule.Onlymodulesareallowedabovemodules.
Theexamplebelowalsocontainstheinheritanceofmethods.
moduleOneMoredefmethod_OneMore()p("OneMore")end
end
moduleMincludeOneMore
defmethod_M()p("M")endend
classCincludeMend
C.new().method_M()#Output"M"C.new().method_OneMore()#Output"OneMore"
AswithclasseswhenwesketchinheritanceitlookslikeFig.7
Figure7:multilevelinclusion
Besides,theclassCalsohasasuperclass.Howisitsrelationshiptomodules?Forinstance,let’sthinkofthefollowingcase.
#modcls.rb
classClsdeftest()return"class"end
end
moduleModdeftest()return"module"endend
classC<ClsincludeModend
p(B.new().test())#"class"?"module"?
CinheritsfromClsandincludesMod.Whichwillbeshowninthiscase,"class"or"module"?Inotherwords,whichoneis“closer”,classormodule?We’dbetteraskRubyaboutRuby,thuslet’sexecuteit:
%rubymodcls.rb"module"
Apparentlyamoduletakespreferencebeforethesuperclass.
Ingeneral,inRubywhenamoduleisincluded,itwouldbeinheritedbygoinginbetweentheclassandthesuperclass.AsapictureitmightlooklikeFig.8.
Figure8:Therelationbetweenmodulesandclasses
Andifwealsotakingthemodulesincludedinthemoduleintoaccounts,itwouldlooklikeFig.9.
Figure9:Therelationbetweenmodulesandclasses(2)
TheProgramrevisited
Caution.Thissectionisextremelyimportantandexplainingtheelementswhicharenoteasytomixwithforprogrammerswhohaveonlyusedstaticlanguagesbefore.Forotherpartsjustskimmingissufficient,butforonlythispartI’dlikeyoutoreaditcarefully.Theexplanationwillalsoberelativelyattentive.
NestingofConstantsFirstarepetitionofconstants.Asaconstantbeginswithacapitalletterthedefinitiongoesasfollows.
Const=3
Nowwereferencetheconstantinthisway.
p(Const)#Shows3
Actuallywecanalsowritethis.
p(::Const)#Shows3inthesameway.
The::infrontshowsthatit’saconstantdefinedatthetoplevel.Youcanthinkofthepathinafilesystem.Assumethereisafilevmunixintherootdirectory.Beingat/onecanwritevmunixtoaccessthefile.Onecanalsowrite/vmunixasitsfullpath.It’sthesamewithConstand::Const.Attoplevelit’sokaytowriteonlyConstortowritethefullpath::Const
Andwhatcorrespondstoafilesystem’sdirectoriesinRuby?Thatshouldbeclassandmoduledefinitionstatements.Howevermentioningbothiscumbersome,soI’lljustsubsumethemunderclassdefinition.Whenoneentersaclassdefinitionthelevelforconstantsrises(asifenteringadirectory).
classSomeClassConst=3end
p(::SomeClass::Const)#Shows3p(SomeClass::Const)#Thesame.Shows3
SomeClassisdefinedattoplevel.HenceonecanreferenceitbywritingeitherSomeClassor::SomeClass.AndastheconstantConst
nestedintheclassdefinitionisaConst“insideSomeClass”,Itbecomes::SomeClass::Const.
Aswecancreateadirectoryinadirectory,wecancreateaclassinsideaclass.Forinstancelikethis:
classC#::CclassC2#::C::C2classC3#::C::C2::C3endendend
Bytheway,foraconstantdefinedinaclassdefinitionstatement,shouldwealwayswriteitsfullname?Ofcoursenot.Aswiththefilesystem,ifoneisinsidethesameclassdefinitiononecanskipthe::.Itbecomeslikethat:
classSomeClassConst=3p(Const)#Shows3.end
“What?”youmightthink.Surprisingly,evenifitisinaclassdefinitionstatement,wecanwriteaprogramwhichisgoingtobeexecuted.Peoplewhoareusedtoonlystaticlanguageswillfindthisquiteexceptional.IwasalsoflabbergastedthefirsttimeIsawit.
Let’saddthatwecanofcoursealsoviewaconstantinsideamethod.Thereferencerulesarethesameaswithintheclassdefinition(outsidethemethod).
classCConst="ok"deftest()p(Const)endend
C.new().test()#Shows"ok"
EverythingisexecutedLookingatthebigpictureIwanttowriteonemorething.InRubyalmostthewholepartsofprogramis“executed”.Constantdefinitions,classdefinitionsandmethoddefinitionsandalmostalltherestisexecutedintheapparentorder.
Lookforinstanceatthefollowingcode.Iusedvariousconstructionswhichhavebeenusedbefore.
1:p("first")2:3:classC<Object4:Const="inC"5:6:p(Const)7:8:defmyupcase(str)9:returnstr.upcase()10:end11:end12:13:p(C.new().myupcase("content"))
Thisprogramisexecutedinthefollowingorder:
1:p("first") Shows"first"
3:<Object TheconstantObjectisreferencedandtheclassobjectObjectisgained
3:classC AnewclassobjectwithsuperclassObjectisgenerated,andassignedtotheconstantC
4:Const="inC" Assigningthevalue"inC"totheconstant::C::Const
6:p(Const) Showingtheconstant::C::Consthence"inC"
8:defmyupcase(...)...end DefineC#myupcase13:C.new().myupcase(...)
RefertheconstantC,callthemethodnewonit,andthenmyupcaseonthereturnvalue
9:returnstr.upcase() Returns"CONTENT"13:p(...) Shows"CONTENT"
TheScopeofLocalVariablesAtlastwecantalkaboutthescopeoflocalvariables.
Thetoplevel,theinteriorofaclassdefinition,theinteriorofamoduledefinitionandamethodbodyareallhaveeachcompletelyindependentlocalvariablescope.Inotherwords,thelvarvariablesinthefollowingprogramarealldifferentvariables,andtheydonotinfluenceeachother.
lvar='toplevel'
classClvar='inC'defmethod()lvar='inC#method'
endend
p(lvar)#Shows"toplevel"
moduleMlvar='inM'end
p(lvar)#Shows"toplevel"
selfascontextPreviously,Isaidthatduringmethodexecutiononeself(anobjectonwhichthemethodwascalled)becomesself.That’struebutonlyhalftrue.ActuallyduringtheexecutionofaRubyprogram,selfisalwayssetwhereveritis.Itmeansthere’sselfalsoatthetoplevelorinaclassdefinitionstatement.
Forinstancetheselfatthetoplevelismain.It’saninstanceoftheObjectclasswhichisnothingspecial.mainisprovidedtosetupselfforthetimebeing.There’snodeepermeaningattachedtoit.
Hencethetoplevel’sselfi.e.mainisaninstanceofObject,suchthatonecancallthemethodsofObjectthere.AndinObjectthemoduleKernelisincluded.Intherethefunction-flavormethodslikepandputsaredefined(Fig.10).That’swhyonecancallputsandpalsoatthetoplevel.
Figure10:main,ObjectandKernel
Thuspisn’tafunction,it’samethod.JustbecauseitisdefinedinKernelandthuscanbecalledlikeafunctionas“itsown”methodwhereveritisornomatterwhattheclassofselfis.Therefore,therearen’tfunctionsinthetruesense,thereareonlymethods.
Bytheway,besidespandputstherearethefunction-flavormethodsprint,puts,printf,sprintf,gets,fork,andexecandmanymorewithsomewhatfamiliarnames.WhenyoulookatthechoiceofnamesyoumightbeabletoimagineRuby’scharacter.
Well,sinceselfissetupeverywhere,selfshouldalsobeinaclassdefinitioninthesameway.Theselfintheclassdefinitionistheclassitself(theclassobject).Henceitwouldlooklikethis.
classCp(self)#Cend
Whatshouldthisbegoodfor?Infact,we’vealreadyseenanexampleinwhichitisveryuseful.Thisone.
moduleMend
classCincludeMend
ThisincludeisactuallyamethodcalltotheclassobjectC.Ihaven’tmentionedityetbuttheparenthesesaroundargumentscanbeomittedformethodcalls.AndIomittedtheparenthesesaroundincludesuchthatitdoesn’tlooklikeamethodcallbecausewehavenotfinishedthetalkaboutclassdefinitionstatement.
LoadingInRubytheloadingoflibrariesalsohappensatruntime.Normallyonewritesthis.
require("library_name")
Theimpressionisn’tfalse,requireisamethod.It’snotevenareservedword.Whenitiswrittenthisway,loadingisexecutedonthelineitiswritten,andtheexecutionishandedoverto(thecodeof)thelibrary.AsthereisnoconceptlikeJavapackagesinRuby,whenwe’dliketoseparatenamespaces,itisdonebyputtingfilesintoadirectory.
require("somelib/file1")require("somelib/file2")
Andinthelibraryusuallyclassesandsucharedefinedwithclassstatementsormodulestatements.Theconstantscopeofthetop
levelisflatwithoutthedistinctionoffiles,soonecanseeclassesdefinedinanotherfilewithoutanyspecialpreparation.Topartitionthenamespaceofclassnamesonehastoexplicitlynestmodulesasshownbelow.
#exampleofthenamespacepartitionofnetlibrarymoduleNetclassSMTP#...endclassPOP#...endclassHTTP#...endend
MoreaboutClasses
ThetalkaboutConstantsstillgoesonUptonowweusedthefilesystemmetaphorforthescopeofconstants,butIwantyoutocompletelyforgetthat.
Thereismoreaboutconstants.Firstlyonecanalsoseeconstantsinthe“outer”class.
Const="ok"classCp(Const)#Shows"ok"
end
Thereasonwhythisisdesignedinthiswayisbecausethisbecomesusefulwhenmodulesareusedasnamespaces.Let’sexplainthisbyaddingafewthingstothepreviousexampleofnetlibrary.
moduleNetclassSMTP#UsesNet::SMTPHelperinthemethodsendclassSMTPHelper#SupportstheclassNet::SMTPendend
Insuchcase,it’sconvenientifwecanrefertoitalsofromtheSMTPclassjustbywritingSMTPHelper,isn’tit?Therefore,itisconcludedthat“it’sconvenientifwecanseetheouterclasses”.
Theouterclasscanbereferencednomatterhowmanytimesitisnesting.Whenthesamenameisdefinedondifferentlevels,theonewhichwillfirstbefoundfromwithinwillbereferredto.
Const="far"classCConst="near"#ThisoneiscloserthantheoneaboveclassC2classC3p(Const)#"near"isshownendendend
There’sanotherwayofsearchingconstants.Ifthetoplevelis
reachedwhengoingfurtherandfurtheroutsidethentheownsuperclassissearchedfortheconstant.
classAConst="ok"endclassB<Ap(Const)#"ok"isshownend
Really,that’sprettycomplicated.
Let’ssummarize.Whenlookingupaconstant,firsttheouterclassesissearchedthenthesuperclasses.Thisisquitecontrived,butlet’sassumeaclasshierarchyasfollows.
classA1endclassA2<A1endclassA3<A2classB1endclassB2<B1endclassB3<B2classC1endclassC2<C1endclassC3<C2p(Const)endendend
WhentheconstantConstinC3isreferenced,it’slookedupinthe
orderdepictedinFig.11.
Figure11:Searchorderforconstants
Becarefulaboutonepoint.Thesuperclassesoftheclassesoutside,forinstanceA1andB2,aren’tsearchedatall.Ifit’soutsideonceit’salwaysoutsideandifit’ssuperclassonceit’salwayssuperclass.Otherwise,thenumberofclassessearchedwouldbecometoobigandthebehaviorofsuchcomplicatedthingwouldbecomeunpredictable.
MetaclassesIsaidthatamethodcanbecalledonifitisanobject.Ialsosaidthatthemethodsthatcanbecalledaredeterminedbytheclassofanobject.Thenshouldn’ttherebeaclassforclassobjects?(Fig.12)
Figure12:Aclassofclasses?
Inthiskindofsituation,inRuby,wecancheckinpractice.It’sbecausethere’s“amethodwhichreturnstheclass(classobject)to
whichanobjectitselfbelongs”,Object#class.
p("string".class())#Stringisshownp(String.class())#Classisshownp(Object.class())#Classisshown
ApparentlyStringbelongstotheclassnamedClass.Thenwhat’stheclassofClass?
p(Class.class())#Classisshown
AgainClass.Inotherwords,whateverobjectitis,byfollowinglike.class().class().class()…,itwouldreachClassintheend,thenitwillstallintheloop(Fig.13).
Figure13:Theclassoftheclassoftheclass…
Classistheclassofclasses.Andwhathasarecursivestructureas“XofX”iscalledameta-X.HenceClassisametaclass.
MetaobjectsLet’schangethetargetandthinkaboutmodules.Asmodulesarealsoobjects,therealsoshouldbeaclassforthem.Let’ssee.
moduleMend
p(M.class())#Moduleisshown
TheclassofamoduleseemstobeModule.AndwhatshouldbetheclassoftheclassModule?
p(Module.class())#Class
It’sagainClass
Nowwechangethedirectionandexaminetheinheritancerelationships.What’sthesuperclassofClassandModule?InRuby,wecanfinditoutwithClass#superclass.
p(Class.superclass())#Modulep(Module.superclass())#Objectp(Object.superclass())#nil
SoClassisasubclassofModule.Basedonthesefacts,Figure14showstherelationshipsbetweentheimportantclassesofRuby.
Figure14:TheclassrelationshipbetweentheimportantRubyclasses
Uptonowweusednewandincludewithoutanyexplanation,butfinallyIcanexplaintheirtrueform.newisreallyamethoddefinedfortheclassClass.Thereforeonwhateverclass,(becauseitisaninstanceofClass),newcanbeusedimmediately.Butnewisn’tdefinedinModule.Henceit’snotpossibletocreateinstancesinamodule.AndsinceincludeisdefinedintheModuleclass,itcanbecalledonbothmodulesandclasses.
ThesethreeclassesObject,ModuleandclassareobjectsthatsupportthefoundationofRuby.WecansaythatthesethreeobjectsdescribetheRuby’sobjectworlditself.Namelytheyareobjectswhichdescribeobjects.Hence,ObjectModuleClassareRuby’s“meta-objects”.
SingletonMethodsIsaidthatmethodscanbecalledifitisanobject.Ialsosaidthatthemethodsthatcanbecalledaredeterminedbytheobject’sclass.HoweverIthinkIalsosaidthatideallymethodsbelongtoobjects.Classesarejustameanstoeliminatetheeffortofdefiningthesamemethodmorethanonce.
ActuallyInRubythere’salsoameanstodefinemethodsforindividualobjects(instances)notdependingontheclass.Todothis,youcanwritethisway.
obj=Object.new()defobj.my_first()puts("Myfirstsingletonmethod")
endobj.my_first()#ShowsMyfirstsingletonmethod
AsyoualreadyknowObjectistherootforeveryclass.It’sveryunlikelythatamethodwhosenameissoweirdlikemy_firstisdefinedinsuchimportantclass.AndobjisaninstanceofObject.Howeverthemethodmy_firstcanbecalledonobj.Hencewehavecreatedwithoutdoubtamethodwhichhasnothingtodowiththeclasstheobjectbelongsto.Thesemethodswhicharedefinedforeachobjectindividuallyarecalledsingletonmethods.
Whenaresingletonmethodsused?First,itisusedwhendefiningsomethinglikestaticmethodsofJavaorC++.Inotherwordsmethodswhichcanbeusedwithoutcreatinganinstance.ThesemethodsareexpressedinRubyassingletonmethodsofaclassobject.
ForexampleinUNIXthere’sasystemcallunlink.Thiscommanddeletesafileentryfromthefilesystem.InRubyitcanbeuseddirectlyasthesingletonmethodunlinkoftheFileclass.Let’stryitout.
File.unlink("core")#deletesthecoredump
It’scumbersometosay“thesingletonmethodunlinkoftheobjectFile”.WesimplywriteFile.unlink.Don’tmixitupandwriteFile#unlink,orviceversadon’twriteFile.writeforthemethodwritedefinedinFile.
▼Asummaryofthemethodnotation
notation thetargetobject exampleFile.unlink theFileclassitself File.unlink("core")File#write aninstanceofFile f.write("str")
ClassVariablesClassvariableswereaddedtoRubyfrom1.6on,theyarearelativelynewmechanism.Aswithconstants,theybelongtoaclass,andtheycanbereferencedandassignedfromboththeclassanditsinstances.Let’slookatanexample.Thebeginningofthenameis@@.
classC@@cvar="ok"p(@@cvar)#"ok"isshown
defprint_cvar()p(@@cvar)endend
C.new().print_cvar()#"ok"isshown
Asthefirstassignmentservesasthedefinition,areferencebeforeanassignmentliketheoneshownbelowleadstoaruntimeerror.Thereisan´@´infrontbutthebehaviordifferscompletelyfrominstancevariables.
%ruby-e'classC@@cvar
end'-e:3:uninitializedclassvariable@@cvarinC(NameError)
HereIwasabitlazyandusedthe-eoption.Theprogramisthethreelinesbetweenthesinglequotes.
Classvariablesareinherited.Orsayingitdifferently,avariableinasuperiorclasscanbeassignedandreferencedintheinferiorclass.
classA@@cvar="ok"end
classB<Ap(@@cvar)#Shows"ok"defprint_cvar()p(@@cvar)endend
B.new().print_cvar()#Shows"ok"
GlobalVariables
Atlasttherearealsoglobalvariables.Theycanbereferencedfromeverywhereandassignedeverywhere.Thefirstletterofthenameisa$.
$gvar="globalvariable"p($gvar)#Shows"globalvariable"
Aswithinstancevariables,allkindsofnamescanbeconsidereddefinedforglobalvariablesbeforeassignments.Inotherwordsareferencebeforeanassignmentgivesanilanddoesn’traiseanerror.
Copyright©2002-2004MineroAoki,Allrightsreserved.
EnglishTranslation:SebastianKrause<[email protected]>
TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera
CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License
RubyHackingGuide
TranslatedbyVincentISAMBART
Chapter2:Objects
StructureofRubyobjects
GuidelineFromthischapter,wewillbeginactuallyexploringtherubysourcecode.First,asdeclaredatthebeginningofthisbook,we’llstartwiththeobjectstructure.
Whatarethenecessaryconditionsforobjectstobeobjects?Therecouldbemanywaystoexplainaboutobjectitself,butthereareonlythreeconditionsthataretrulyindispensable.
1. Theabilitytodifferentiateitselffromotherobjects(anidentity)
2. Theabilitytorespondtomessages(methods)3. Theabilitytostoreinternalstate(instancevariables)
Inthischapter,wearegoingtoconfirmthesethreefeaturesonebyone.
Thetargetfileismainlyruby.h,butwewillalsobrieflylookatotherfilessuchasobject.c,class.corvariable.c.
VALUEandobjectstructInruby,thebodyofanobjectisexpressedbyastructandalwayshandledviaapointer.Adifferentstructtypeisusedforeachclass,butthepointertypewillalwaysbeVALUE(figure1).
Figure1:VALUEandstruct
HereisthedefinitionofVALUE:
▼VALUE
71typedefunsignedlongVALUE;
(ruby.h)
Inpractice,whenusingaVALUE,wecastittothepointertoeachobjectstruct.Thereforeifanunsignedlongandapointerhaveadifferentsize,rubywillnotworkwell.Strictlyspeaking,itwillnotworkifthere’sapointertypethatisbiggerthansizeof(unsignedlong).Fortunately,systemswhichcouldnotmeetthisrequirementisunlikelyrecently,butsometimeagoitseemstherewerequiteafewofthem.
Thestructs,ontheotherhand,haveseveralvariations,adifferent
structisusedbasedontheclassoftheobject.
structRObject allthingsforwhichnoneofthefollowingappliesstructRClass classobjectstructRFloat smallnumbersstructRString stringstructRArray arraystructRRegexp regularexpressionstructRHash hashtablestructRFile IO,File,Socket,etc…structRData
alltheclassesdefinedatClevel,excepttheonesmentionedabove
structRStruct Ruby’sStructclassstructRBignum bigintegers
Forexample,foranstringobject,structRStringisused,sowewillhavesomethinglikethefollowing.
Figure2:Stringobject
Let’slookatthedefinitionofafewobjectstructs.
▼Examplesofobjectstruct
/*structforordinaryobjects*/295structRObject{296structRBasicbasic;297structst_table*iv_tbl;298};
/*structforstrings(instanceofString)*/314structRString{315structRBasicbasic;316longlen;317char*ptr;318union{319longcapa;320VALUEshared;321}aux;322};
/*structforarrays(instanceofArray)*/324structRArray{325structRBasicbasic;326longlen;327union{328longcapa;329VALUEshared;330}aux;331VALUE*ptr;332};
(ruby.h)
Beforelookingateveryoneofthemindetail,let’sbeginwithsomethingmoregeneral.
First,asVALUEisdefinedasunsignedlong,itmustbecastbeforebeingusedwhenitisusedasapointer.That’swhyRxxxx()macroshavebeenmadeforeachobjectstruct.Forexample,forstruct
RStringthereisRSTRING(),forstructRArraythereisRARRAY(),etc…Thesemacrosareusedlikethis:
VALUEstr=....;VALUEarr=....;RSTRING(str)->len;/*((structRString*)str)->len*/RARRAY(arr)->len;/*((structRArray*)arr)->len*/
AnotherimportantpointtomentionisthatallobjectstructsstartwithamemberbasicoftypestructRBasic.Asaresult,ifyoucastthisVALUEtostructRBasic*,youwillbeabletoaccessthecontentofbasic,regardlessofthetypeofstructpointedtobyVALUE.
Figure3:structRBasic
Becauseitispurposefullydesignedthisway,structRBasicmustcontainveryimportantinformationforRubyobjects.HereisthedefinitionforstructRBasic:
▼structRBasic
290structRBasic{291unsignedlongflags;292VALUEklass;
293};
(ruby.h)
flagsaremultipurposeflags,mostlyusedtoregisterthestructtype(forinstancestructRObject).ThetypeflagsarenamedT_xxxx,andcanbeobtainedfromaVALUEusingthemacroTYPE().Hereisanexample:
VALUEstr;str=rb_str_new();/*createsaRubystring(itsstructisRString)*/TYPE(str);/*thereturnvalueisT_STRING*/
TheallflagsarenamedasT_xxxx,likeT_STRINGforstructRStringandT_ARRAYforstructRArray.Theyareverystraightforwardlycorrespondedtothetypenames.
TheothermemberofstructRBasic,klass,containstheclassthisobjectbelongsto.AstheklassmemberisoftypeVALUE,whatisstoredis(apointerto)aRubyobject.Inshort,itisaclassobject.
Figure4:objectandclass
Therelationbetweenanobjectanditsclasswillbedetailedinthe
“Methods”sectionofthischapter.
Bytheway,thismemberisnamedklasssoasnottoconflictwiththereservedwordclasswhenthefileisprocessedbyaC++compiler.
AboutstructtypesIsaidthatthetypeofstructisstoredintheflagsmemberofstructBasic.Butwhydowehavetostorethetypeofstruct?It’stobeabletohandlealldifferenttypesofstructviaVALUE.IfyoucastapointertoastructtoVALUE,asthetypeinformationdoesnotremain,thecompilerwon’tbeabletohelp.Thereforewehavetomanagethetypeourselves.That’stheconsequenceofbeingabletohandleallthestructtypesinaunifiedway.
OK,buttheusedstructisdefinedbytheclasssowhyarethestructtypeandclassarestoredseparately?Beingabletofindthestructtypefromtheclassshouldbeenough.Therearetworeasonsfornotdoingthis.
Thefirstoneis(I’msorryforcontradictingwhatIsaidbefore),infacttherearestructsthatdonothaveastructRBasic(i.e.theyhavenoklassmember).ForexamplestructRNodethatwillappearinthesecondpartofthebook.However,flagsisguaranteedtobeinthebeginningmemberseveninspecialstructslikethis.Soifyouputthetypeofstructinflags,alltheobjectstructscanbedifferentiatedinoneunifiedway.
Thesecondreasonisthatthereisnoone-to-onecorrespondencebetweenclassandstruct.Forexample,alltheinstancesofclassesdefinedattheRubylevelusestructRObject,sofindingastructfromaclasswouldrequiretokeepthecorrespondencebetweeneachclassandstruct.That’swhyit’seasierandfastertoputtheinformationaboutthetypeinthestruct.
Theuseofbasic.flagsRegardingtheuseofbasic.flags,becauseIfeelbadtosayitisthestructtype“andsuch”,I’llillustrateitentirelyhere.(Figure5)Thereisnoneedtounderstandeverythingrightaway,becausethisispreparedforthetimewhenyouwillbewonderingaboutitlater.
Figure5:Useofflags
Whenlookingatthediagram,itlookslikethat21bitsarenotusedon32bitmachines.Ontheseadditionalbits,theflagsFL_USER0toFL_USER8aredefined,andareusedforadifferentpurposeforeach
struct.InthediagramIalsoputFL_USER0(FL_SINGLETON)asanexample.
ObjectsembeddedinVALUEAsIsaid,VALUEisanunsignedlong.AsVALUEisapointer,itmaylooklikevoid*wouldalsobeallright,butthereisareasonfornotdoingthis.Infact,VALUEcanalsonotbeapointer.The6casesforwhichVALUEisnotapointerarethefollowing:
1. smallintegers2. symbols3. true4. false5. nil6. Qundef
I’llexplainthemonebyone.
SmallintegersAlldataareobjectsinRuby,thusintegersarealsoobjects.Butsincetherearesomanykindofintegerobjects,ifeachofthemisexpressedasastruct,itwouldriskslowingdownexecutionsignificantly.Forexample,whenincrementingfrom0to50000,wewouldhesitatetocreate50000objectsforonlythatpurpose.
That’swhyinruby,integersthataresmalltosomeextentare
treatedspeciallyandembeddeddirectlyintoVALUE.“Small”meanssignedintegersthatcanbeheldinsizeof(VALUE)*8-1bits.Inotherwords,on32bitsmachines,theintegershave1bitforthesign,and30bitsfortheintegerpart.IntegersinthisrangewillbelongtotheFixnumclassandtheotherintegerswillbelongtotheBignumclass.
Let’sseeinpracticetheINT2FIX()macrothatconvertsfromaCinttoaFixnum,andconfirmthatFixnumaredirectlyembeddedinVALUE.
▼INT2FIX
123#defineINT2FIX(i)((VALUE)(((long)(i))<<1|FIXNUM_FLAG))122#defineFIXNUM_FLAG0x01
(ruby.h)
Inbrief,shift1bittotheleft,andbitwiseoritwith1.
110100001000 beforeconversion1101000010001 afterconversion
ThatmeansthatFixnumasVALUEwillalwaysbeanoddnumber.Ontheotherhand,asRubyobjectstructsareallocatedwithmalloc(),theyaregenerallyarrangedonaddressesmultipleof4.SotheydonotoverlapwiththevaluesofFixnumasVALUE.
Also,toconvertintorlongtoVALUE,wecanusemacroslikeINT2NUM()orLONG2NUM().AnyconversionmacroXXXX2XXXXwithanamecontainingNUMcanmanagebothFixnumandBignum.ForexampleifINT2NUM()can’tconvertanintegerintoaFixnum,itwill
automaticallyconvertittoBignum.NUM2INT()willconvertbothFixnumandBignumtoint.Ifthenumbercan’tfitinanint,anexceptionwillberaised,sothereisnoneedtocheckthevaluerange.
SymbolsWhataresymbols?
Asthisquestionisquitetroublesometoanswer,let’sstartwiththereasonswhysymbolswerenecessary.Inthefirstplace,there’satypenamedIDusedinsideruby.Hereitis.
▼ID
72typedefunsignedlongID;
(ruby.h)
ThisIDisanumberhavingaone-to-oneassociationwithastring.However,it’snotpossibletohaveanassociationbetweenallstringsinthisworldandnumericalvalues.Itislimitedtotheonetoonerelationshipsinsideonerubyprocess.I’llspeakofthemethodtofindanIDinthenextchapter“Namesandnametables”.
Inlanguageprocessor,therearealotofnamestohandle.Methodnamesorvariablenames,constantnames,filenames,classnames…It’stroublesometohandleallofthemasstrings(char*),becauseofmemorymanagementandmemorymanagementandmemorymanagement…Also,lotsofcomparisonswouldcertainly
benecessary,butcomparingstringscharacterbycharacterwillslowdowntheexecution.That’swhystringsarenothandleddirectly,somethingwillbeassociatedandusedinstead.Andgenerallythat“something”willbeintegers,astheyarethesimplesttohandle.
TheseIDarefoundassymbolsintheRubyworld.Uptoruby1.4,thevaluesofIDconvertedtoFixnumwereusedassymbols.EventodaythesevaluescanbeobtainedusingSymbol#to_i.However,asrealuseresultscamepilingup,itwasunderstoodthatmakingFixnumandSymbolthesamewasnotagoodidea,sosince1.6anindependentclassSymbolhasbeencreated.
Symbolobjectsareusedalot,especiallyaskeysforhashtables.That’swhySymbol,likeFixnum,wasmadeembeddedinVALUE.Let’slookattheID2SYM()macroconvertingIDtoSymbolobject.
▼ID2SYM
158#defineSYMBOL_FLAG0x0e160#defineID2SYM(x)((VALUE)(((long)(x))<<8|SYMBOL_FLAG))
(ruby.h)
Whenshifting8bitsleft,xbecomesamultipleof256,thatmeansamultipleof4.Thenafterwithabitwiseor(inthiscaseit’sthesameasadding)with0x0e(14indecimal),theVALUEexpressingthesymbolisnotamultipleof4.Orevenanoddnumber.SoitdoesnotoverlaptherangeofanyotherVALUE.Quiteaclevertrick.
Finally,let’sseethereverseconversionofID2SYM(),SYM2ID().
▼SYM2ID()
161#defineSYM2ID(x)RSHIFT((long)x,8)
(ruby.h)
RSHIFTisabitshifttotheright.Asrightshiftmaykeepornotthesigndependingoftheplatform,itbecameamacro.
truefalsenil
ThesethreeareRubyspecialobjects.trueandfalserepresentthebooleanvalues.nilisanobjectusedtodenotethatthereisnoobject.TheirvaluesattheClevelaredefinedlikethis:
▼truefalsenil
164#defineQfalse0/*Ruby'sfalse*/165#defineQtrue2/*Ruby'strue*/166#defineQnil4/*Ruby'snil*/
(ruby.h)
Thistimeit’sevennumbers,butas0or2can’tbeusedbypointers,theycan’toverlapwithotherVALUE.It’sbecauseusuallythefirstblockofvirtualmemoryisnotallocated,tomaketheprogramsdereferencingaNULLpointercrash.
AndasQfalseis0,itcanalsobeusedasfalseatClevel.Inpractice,
inruby,whenafunctionreturnsabooleanvalue,it’softenmadetoreturnanintorVALUE,andreturnsQtrue/Qfalse.
ForQnil,thereisamacrodedicatedtocheckifaVALUEisQnilornot,NIL_P().
▼NIL_P()
170#defineNIL_P(v)((VALUE)(v)==Qnil)
(ruby.h)
ThenameendingwithpisanotationcomingfromLispdenotingthatitisafunctionreturningabooleanvalue.Inotherwords,NIL_Pmeans“istheargumentnil?”.Itseemsthe“p”charactercomesfrom“predicate.”Thisnamingruleisusedatmanydifferentplacesinruby.
Also,inRuby,falseandnilarefalse(inconditionalstatements)andalltheotherobjectsaretrue.However,inC,nil(Qnil)istrue.That’swhythere’stheRTEST()macrotodoRuby-styletestinC.
▼RTEST()
169#defineRTEST(v)(((VALUE)(v)&~Qnil)!=0)
(ruby.h)
AsinQnilonlythethirdlowerbitis1,in~Qnilonlythethirdlowerbitis0.ThenonlyQfalseandQnilbecome0withabitwiseand.
!=0hasbeenaddedtobecertaintoonlyhave0or1,tosatisfytherequirementsofthegliblibrarythatonlywants0or1([ruby-dev:11049]).
Bytheway,whatisthe‘Q’ofQnil?‘R’Iwouldhaveunderstoodbutwhy‘Q’?WhenIasked,theanswerwas“Becauseit’slikethatinEmacs.”IdidnothavethefunanswerIwasexpecting…
Qundef
▼Qundef
167#defineQundef6/*undefinedvalueforplaceholder*/
(ruby.h)
Thisvalueisusedtoexpressanundefinedvalueintheinterpreter.Itcan’t(mustnot)befoundatallattheRubylevel.
Methods
IalreadybroughtupthethreeimportantpointsofaRubyobject:havinganidentity,beingabletocallamethod,andkeepingdataforeachinstance.Inthissection,I’llexplaininasimplewaythestructurelinkingobjectsandmethods.
structRClass
InRuby,classesexistasobjectsduringtheexecution.Ofcourse.Sotheremustbeastructforclassobjects.ThatstructisstructRClass.ItsstructtypeflagisT_CLASS.
Asclassesandmodulesareverysimilar,thereisnoneedtodifferentiatetheircontent.That’swhymodulesalsousethestructRClassstruct,andaredifferentiatedbytheT_MODULEstructflag.
▼structRClass
300structRClass{301structRBasicbasic;302structst_table*iv_tbl;303structst_table*m_tbl;304VALUEsuper;305};
(ruby.h)
First,let’sfocusonthem_tbl(MethodTaBLe)member.structst_tableisanhashtableusedeverywhereinruby.Itsdetailswillbeexplainedinthenextchapter“Namesandnametables”,butbasically,itisatablemappingnamestoobjects.Inthecaseofm_tbl,itkeepsthecorrespondencebetweenthename(ID)ofthemethodspossessedbythisclassandthemethodsentityitself.Asforthestructureofthemethodentity,itwillbeexplainedinPart2andPart3.
Thefourthmembersuperkeeps,likeitsnamesuggests,thesuperclass.Asit’saVALUE,it’s(apointerto)theclassobjectofthesuperclass.InRubythereisonlyoneclassthathasnosuperclass
(therootclass):Object.
HoweverIalreadysaidthatallObjectmethodsaredefinedintheKernelmodule,Objectjustincludesit.Asmodulesarefunctionallysimilartomultipleinheritance,itmayseemhavingjustsuperisproblematic,butinrubysomecleverconversionsaremadetomakeitlooklikesingleinheritance.Thedetailsofthisprocesswillbeexplainedinthefourthchapter“Classesandmodules”.
Becauseofthisconversion,superofthestructofObjectpointstostructRClasswhichistheentityofKernelobjectandthesuperofKernelisNULL.Sotoputitconversely,ifsuperisNULL,itsRClassistheentityofKernel(figure6).
Figure6:ClasstreeattheClevel
MethodssearchWithclassesstructuredlikethis,youcaneasilyimaginethemethodcallprocess.Them_tbloftheobject’sclassissearched,andifthemethodwasnotfound,them_tblofsuperissearched,andsoon.Ifthereisnomoresuper,thatistosaythemethodwasnotfoundeveninObject,thenitmustnotbedefined.
Thesequentialsearchprocessinm_tblisdonebysearch_method().
▼search_method()
256staticNODE*257search_method(klass,id,origin)258VALUEklass,*origin;259IDid;260{261NODE*body;262263if(!klass)return0;264while(!st_lookup(RCLASS(klass)->m_tbl,id,&body)){265klass=RCLASS(klass)->super;266if(!klass)return0;267}268269if(origin)*origin=klass;270returnbody;271}
(eval.c)
Thisfunctionsearchesthemethodnamedidintheclassobjectklass.
RCLASS(value)isthemacrodoing:
((structRClass*)(value))
st_lookup()isafunctionthatsearchesinst_tablethevaluecorrespondingtoakey.Ifthevalueisfound,thefunctionreturnstrueandputsthefoundvalueattheaddressgiveninthirdparameter(&body).
Nevertheless,doingthissearcheachtimewhateverthecircumstanceswouldbetooslow.That’swhyinreality,oncecalled,amethodiscached.Sostartingfromthesecondtimeitwillbefoundwithoutfollowingsuperonebyone.Thiscacheanditssearchwillbeseeninthe15thchapter“Methods”.
Instancevariables
Inthissection,Iwillexplaintheimplementationofthethirdessentialcondition,instancevariables.
rb_ivar_set()
Instancevariableisthemechanismthatallowseachobjecttoholditsspecificdata.Sinceitisspecifictoeachobject,itseemsgoodtostoreitineachobjectitself(i.e.initsobjectstruct),butisitreallyso?Let’slookatthefunctionrb_ivar_set(),whichassignsanobjecttoaninstancevariable.
▼rb_ivar_set()
/*assignvaltotheidinstancevariableofobj*/984VALUE985rb_ivar_set(obj,id,val)986VALUEobj;987IDid;988VALUEval;989{
990if(!OBJ_TAINTED(obj)&&rb_safe_level()>=4)991rb_raise(rb_eSecurityError,"Insecure:can'tmodifyinstancevariable");992if(OBJ_FROZEN(obj))rb_error_frozen("object");993switch(TYPE(obj)){994caseT_OBJECT:995caseT_CLASS:996caseT_MODULE:997if(!ROBJECT(obj)->iv_tbl)ROBJECT(obj)->iv_tbl=st_init_numtable();998st_insert(ROBJECT(obj)->iv_tbl,id,val);999break;1000default:1001generic_ivar_set(obj,id,val);1002break;1003}1004returnval;1005}
(variable.c)
rb_raise()andrb_error_frozen()arebotherrorchecks.Thiscanalwaysbesaidhereafter:Errorchecksarenecessaryinreality,butit’snotthemainpartoftheprocess.Therefore,weshouldwhollyignorethematfirstread.
Afterremovingtheerrorhandling,onlytheswitchremains,but
switch(TYPE(obj)){caseT_aaaa:caseT_bbbb:...}
thisformisanidiomofruby.TYPE()isthemacroreturningthetypeflagoftheobjectstruct(T_OBJECT,T_STRING,etc.).Inotherwordsas
thetypeflagisanintegerconstant,wecanbranchdependingonitwithaswitch.FixnumorSymboldonothavestructs,butinsideTYPE()aspecialtreatmentisdonetoproperlyreturnT_FIXNUMandT_SYMBOL,sothere’snoneedtoworry.
Well,let’sgobacktorb_ivar_set().ItseemsonlythetreatmentsofT_OBJECT,T_CLASSandT_MODULEaredifferent.These3havebeenchosenonthebasisthattheirsecondmemberisiv_tbl.Let’sconfirmitinpractice.
▼Structswhosesecondmemberisiv_tbl
/*TYPE(val)==T_OBJECT*/295structRObject{296structRBasicbasic;297structst_table*iv_tbl;298};
/*TYPE(val)==T_CLASSorT_MODULE*/300structRClass{301structRBasicbasic;302structst_table*iv_tbl;303structst_table*m_tbl;304VALUEsuper;305};
(ruby.h)
iv_tblistheInstanceVariableTaBLe.Itrecordsthecorrespondencesbetweentheinstancevariablenamesandtheirvalues.
Inrb_ivar_set(),let’slookagainthecodeforthestructshaving
iv_tbl.
if(!ROBJECT(obj)->iv_tbl)ROBJECT(obj)->iv_tbl=st_init_numtable();st_insert(ROBJECT(obj)->iv_tbl,id,val);break;
ROBJECT()isamacrothatcastsaVALUEintoa`structRObject*.It'spossiblethatwhatobj`pointstoisactuallyastructRClass,butwhenaccessingonlythesecondmember,noproblemwilloccur.
st_init_numtable()isafunctioncreatinganewst_table.st_insert()isafunctiondoingassociationsinast_table.
Inconclusion,thiscodedoesthefollowing:ifiv_tbldoesnotexist,itcreatesit,thenstoresthe[variablename→object]association.
There’sonethingtobecarefulabout.AsstructRClassisthestructofaclassobject,itsinstancevariabletableisfortheclassobjectitself.InRubyprograms,itcorrespondstosomethinglikethefollowing:
classC@ivar="content"end
generic_ivar_set()
WhathappenswhenassigningtoaninstancevariableofanobjectwhosestructisnotoneofT_OBJECTT_MODULET_CLASS?
▼rb_ivar_set()inthecasethereisnoiv_tbl
1000default:1001generic_ivar_set(obj,id,val);1002break;
(variable.c)
Thisisdelegatedtogeneric_ivar_set().Beforelookingatthisfunction,let’sfirstexplainitsgeneralidea.
StructsthatarenotT_OBJECT,T_MODULEorT_CLASSdonothaveaniv_tblmember(thereasonwhytheydonothaveitwillbeexplainedlater).However,evenifitdoesnothavethemember,ifthere’sanothermethodlinkinganinstancetoastructst_table,itwouldbeabletohaveinstancevariables.Inruby,theseassociationsaresolvedbyusingaglobalst_table,generic_iv_table(figure7).
Figure7:generic_iv_table
Let’sseethisinpractice.
▼generic_ivar_set()
801staticst_table*generic_iv_tbl;
830staticvoid831generic_ivar_set(obj,id,val)832VALUEobj;833IDid;834VALUEval;835{836st_table*tbl;837/*forthetimebeingyoucanignorethis*/838if(rb_special_const_p(obj)){839special_generic_ivar=1;840}/*initializegeneric_iv_tblifitdoesnotexist*/841if(!generic_iv_tbl){842generic_iv_tbl=st_init_numtable();843}844/*theprocessitself*/845if(!st_lookup(generic_iv_tbl,obj,&tbl)){846FL_SET(obj,FL_EXIVAR);847tbl=st_init_numtable();848st_add_direct(generic_iv_tbl,obj,tbl);849st_add_direct(tbl,id,val);850return;851}852st_insert(tbl,id,val);853}
(variable.c)
rb_special_const_p()istruewhenitsparameterisnotapointer.However,asthisifpartrequiresknowledgeofthegarbagecollector,we’llskipitfornow.I’dlikeyoutocheckitagainafterreadingthechapter5“Garbagecollection”.
st_init_numtable()alreadyappearedsometimeago.Itcreatesa
newhashtable.
st_lookup()searchesavaluecorrespondingtoakey.Inthiscaseitsearchesforwhat’sattachedtoobj.Ifanattachedvaluecanbefound,thewholefunctionreturnstrueandstoresthevalueattheaddress(&tbl)givenasthirdparameter.Inshort,!st_lookup(...)canberead“ifavaluecan’tbefound”.
st_insert()wasalsoalreadyexplained.Itstoresanewassociationinatable.
st_add_direct()issimilartost_insert(),butitdoesnotcheckifthekeywasalreadystoredbeforeaddinganassociation.Itmeans,inthecaseofst_add_direct(),ifakeyalreadyregisteredisbeingused,twoassociationslinkedtothissamekeywillbestored.Wecanusest_add_direct()onlywhenthecheckforexistencehasalreadybeendone,orwhenanewtablehasjustbeencreated.Andthiscodewouldmeettheserequirements.
FL_SET(obj,FL_EXIVAR)isthemacrothatsetstheFL_EXIVARflaginthebasic.flagsofobj.Thebasic.flagsflagsareallnamedFL_xxxxandcanbesetusingFL_SET().TheseflagscanbeunsetwithFL_UNSET().TheEXIVARfromFL_EXIVARseemstobetheabbreviationofEXternalInstanceVARiable.
Thisflagissettospeedupthereadingofinstancevariables.IfFL_EXIVARisnotset,evenwithoutsearchingingeneric_iv_tbl,wecanseetheobjectdoesnothaveanyinstancevariables.Andof
courseabitcheckiswayfasterthansearchingastructst_table.
GapsinstructsNowyouunderstoodthewaytostoretheinstancevariables,butwhyaretherestructswithoutiv_tbl?Whyistherenoiv_tblinstructRStringorstructRArray?Couldn’tiv_tblbepartofRBasic?
Totelltheconclusionfirst,wecandosuchthing,butshouldnot.Asamatteroffact,thisproblemisdeeplylinkedtothewayrubymanagesobjects.
Inruby,thememoryusedforstringdata(char[])andsuchisdirectlyallocatedusingmalloc().However,theobjectstructsarehandledinaparticularway.rubyallocatesthembyclusters,andthendistributethemfromtheseclusters.Andinthisway,ifthetypes(orrathertheirsizes)werediverse,it’shardtomanage,thusRVALUE,whichistheunionoftheallstructs,isdefinedandthearrayoftheunionsismanaged.
Thesizeofaunionisthesameasthesizeofthebiggestmember,soforinstance,ifoneofthestructsisbig,alotofspacewouldbewasted.Therefore,it’spreferablethateachstructsizeisassimilaraspossible.
ThemostusedstructmightbeusuallystructRString.Afterthat,dependingoneachprogram,therecomesstructRArray(array),RHash(hash),RObject(userdefinedobject),etc.However,thisstruct
RObjectonlyusesthespaceofstructRBasic+1pointer.Ontheotherhand,structRString,RArrayandRHashtakethespaceofstructRBasic+3pointers.Inotherwords,whenthenumberofstructRObjectisbeingincreased,thememoryspaceofthetwopointersforeachobjectarewasted.Furthermore,ifthesizeofRStringwasasmuchas4pointers,Robjectwoulduselessthanthehalfsizeoftheunion,andthisistoowasteful.
Sothereceivedmeritforiv_tblismoreorlesssavingmemoryandspeedingup.Furthermorewedonotknowifitisusedoftenornot.Infact,generic_iv_tblwasnotintroducedbeforeruby1.2,soitwasnotpossibletouseinstancevariablesinStringorArrayatthattime.Nevertheless,itwasnotmuchofaproblem.Makinglargeamountsofmemoryuselessjustforsuchfunctionalitylooksstupid.
Ifyoutakeallthisintoconsideration,youcanconcludethatincreasingthesizeofobjectstructsforiv_tbldoesnotdoanygood.
rb_ivar_get()
Wesawtherb_ivar_set()functionthatsetsvariables,solet’sseequicklyhowtogetthem.
▼rb_ivar_get()
960VALUE961rb_ivar_get(obj,id)962VALUEobj;963IDid;964{
965VALUEval;966967switch(TYPE(obj)){/*(A)*/968caseT_OBJECT:969caseT_CLASS:970caseT_MODULE:971if(ROBJECT(obj)->iv_tbl&&st_lookup(ROBJECT(obj)->iv_tbl,id,&val))972returnval;973break;/*(B)*/974default:975if(FL_TEST(obj,FL_EXIVAR)||rb_special_const_p(obj))976returngeneric_ivar_get(obj,id);977break;978}/*(C)*/979rb_warning("instancevariable%snotinitialized",rb_id2name(id));980981returnQnil;982}
(variable.c)
Thestructureiscompletelythesame.
(A)ForstructRObjectorRClass,wesearchthevariableiniv_tbl.Asiv_tblcanalsobeNULL,wemustcheckitbeforeusingit.Thenifst_lookup()findstherelation,itreturnstrue,sothewholeifcanbereadas“Iftheinstancevariablehasbeenset,returnitsvalue”.
(C)Ifnocorrespondencecouldbefound,inotherwordsifwereadaninstancevariablethathasnotbeenset,wefirstleavetheifthentheswitch.rb_warning()willthenissueawarningandnilwillbereturned.That’sbecauseyoucanreadinstancevariablesthathave
notbeensetinRuby.
(B)Ontheotherhand,ifthestructisneitherstructRObjectnorRClass,theinstancevariabletableissearchedingeneric_iv_tbl.Whatgeneric_ivar_get()doescanbeeasilyguessed,soIwon’texplainit.I’dratherwantyoutofocusontheconditionoftheifstatement.
IalreadytoldyouthattheFL_EXIRVARflagissettotheobjectonwhichgeneric_ivar_set()isused.Here,thatflagisutilizedtomakethecheckfaster.
Andwhatisrb_special_const_p()?Thisfunctionreturnstruewhenitsparameterobjdoesnotpointtoastruct.Asnostructmeansnobasic.flags,noflagcanbesetinthefirstplace.ThusFL_xxxx()isdesignedtoalwaysreturnfalseforsuchobject.Hence,objectsthatarerb_special_const_p()shouldbetreatedspeciallyhere.
ObjectStructs
Inthissection,abouttheimportantonesamongobjectstructs,we’llbrieflyseetheirconcreteappearancesandhowtodealwiththem.
structRString
structRStringisthestructfortheinstancesoftheStringclassanditssubclasses.
▼structRString
314structRString{315structRBasicbasic;316longlen;317char*ptr;318union{319longcapa;320VALUEshared;321}aux;322};
(ruby.h)
ptrisapointertothestring,andlenthelengthofthatstring.Verystraightforward.
Ratherthanastring,Ruby’sstringismoreabytearray,andcancontainanybyteincludingNUL.SowhenthinkingattheRubylevel,endingthestringwithNULdoesnotmeananything.ButasCfunctionsrequireNUL,forconveniencetheendingNUListhere.However,itssizeisnotincludedinlen.
Whendealingwithastringfromtheinterpreteroranextensionlibrary,youcanaccessptrandlenbywritingRSTRING(str)->ptrorRSTRING(str)->len,anditisallowed.Buttherearesomepointstopayattentionto.
1. youhavetocheckifstrreallypointstoastructRStringby
yourselfbeforehand2. youcanreadthemembers,butyoumustnotmodifythem3. youcan’tstoreRSTRING(str)->ptrinsomethinglikealocal
variableanduseitlater
Whyisthat?First,thereisanimportantsoftwareengineeringprinciple:Don’tarbitrarilytamperwithsomeone’sdata.Whenthereareinterfacefunctions,weshouldusethem.However,therearealsoconcretereasonsinruby‘sdesignwhyyoushouldnotrefertoorstoreapointer,andthat’srelatedtothefourthmemberaux.However,toexplainproperlyhowtouseaux,wehavetoexplainfirstalittlemoreofRuby’sstrings’characteristics.
Ruby’sstringscanbemodified(aremutable).BymutableImeanafterthefollowingcode:
s="str"#createastringandassignittoss.concat("ing")#append"ing"tothisstringobjectp(s)#show"string"
thecontentoftheobjectpointedbyswillbecome“string”.It’sdifferentfromJavaorPythonstringobjects.Java’sStringBufferiscloser.
Andwhat’stherelation?First,mutablemeansthelength(len)ofthestringcanchange.Wehavetoincreaseordecreasetheallocatedmemorysizeeachtimethelengthchanges.Wecanofcourseuserealloc()forthat,butgenerallymalloc()andrealloc()areheavyoperations.Havingtorealloc()eachtimethestring
changesisahugeburden.
That’swhythememorypointedbyptrhasbeenallocatedwithasizealittlebiggerthanlen.Becauseofthat,iftheaddedpartcanfitintotheremainingmemory,it’stakencareofwithoutcallingrealloc(),soit’sfaster.Thestructmemberaux.capacontainsthelengthincludingthisadditionalmemory.
Sowhatisthisotheraux.shared?It’stospeedupthecreationofliteralstrings.HavealookatthefollowingRubyprogram.
whiletruedo#repeatindefinitelya="str"#createastringwith"str"ascontentandassignittoaa.concat("ing")#append"ing"totheobjectpointedbyap(a)#show"string"end
Whateverthenumberoftimesyourepeattheloop,thefourthline’sphastoshow"string".Andtodoso,theexpression"str"musteverytimecreateanobjectthatholdsadistinctchar[].Buttheremustbealsothehighpossibilitythatstringsarenotmodifiedatall,andalotofuselesscopiesofchar[]wouldbecreatedinsuchsituation.Ifpossible,we’dliketoshareonecommonchar[].
Thetricktoshareisaux.shared.Everystringobjectcreatedwithaliteralusesonesharedchar[].Andafterachangeoccurs,theobject-specificmemoryisallocated.Whenusingasharedchar[],theflagELTS_SHAREDissetintheobjectstruct’sbasic.flags,andaux.sharedcontainstheoriginalobject.ELTSseemstobethe
abbreviationofELemenTS.
Then,let’sreturntoourtalkaboutRSTRING(str)->ptr.ThoughreferringtoapointerisOK,youmustnotassigntoit.Thisisfirstbecausethevalueoflenorcapawillnolongeragreewiththeactualbody,andalsobecausewhenmodifyingstringscreatedaslitterals,aux.sharedhastobeseparated.
Beforeendingthissection,I’llwritesomeexamplesofdealingwithRString.I’dlikeyoutoregardstrasaVALUEthatpointstoRStringwhenreadingthis.
RSTRING(str)->len;/*length*/RSTRING(str)->ptr[0];/*firstcharacter*/str=rb_str_new("content",7);/*createastringwith"content"asitscontentthesecondparameteristhelength*/str=rb_str_new2("content");/*createastringwith"content"asitscontentitslengthiscalculatedwithstrlen()*/rb_str_cat2(str,"end");/*ConcatenateaCstringtoaRubystring*/
structRArray
structRArrayisthestructfortheinstancesofRuby’sarrayclassArray.
▼structRArray
324structRArray{325structRBasicbasic;326longlen;327union{328longcapa;329VALUEshared;
330}aux;331VALUE*ptr;332};
(ruby.h)
Exceptforthetypeofptr,thisstructureisalmostthesameasstructRString.ptrpointstothecontentofthearray,andlenisitslength.auxisexactlythesameasinstructRString.aux.capaisthe“real”lengthofthememorypointedbyptr,andifptrisshared,aux.sharedstoresthesharedoriginalarrayobject.
Fromthisstructure,it’sclearthatRuby’sArrayisanarrayandnotalist.Sowhenthenumberofelementschangesinabigway,arealloc()mustbedone,andifanelementmustbeinsertedatanotherplacethantheend,amemmove()willoccur.Butevenifitdoesit,it’smovingsofastthatwedon’tnoticeaboutthat.Recentmachinesarereallyimpressive.
AndthewaytoaccesstoitsmembersissimilartothewayofRString.WithRARRAY(arr)->ptrandRARRAY(arr)->len,youcanrefertothemembers,anditisallowed,butyoumustnotassigntothem,etc.We’llonlylookatsimpleexamples:
/*manageanarrayfromC*/VALUEary;ary=rb_ary_new();/*createanemptyarray*/rb_ary_push(ary,INT2FIX(9));/*pushaRuby9*/RARRAY(ary)->ptr[0];/*lookwhat'satindex0*/rb_p(RARRAY(ary)->ptr[0]);/*doponary[0](theresultis9)*/
#manageanarrayfromRuby
ary=[]#createanemptyarrayary.push(9)#push9ary[0]#lookwhat'satindex0p(ary[0])#doponary[0](theresultis9)
structRRegexp
It’sthestructfortheinstancesoftheregularexpressionclassRegexp.
▼structRRegexp
334structRRegexp{335structRBasicbasic;336structre_pattern_buffer*ptr;337longlen;338char*str;339};
(ruby.h)
ptristhecompiledregularexpression.stristhestringbeforecompilation(thesourcecodeoftheregularexpression),andlenisthisstring’slength.
AsanycodetohandleRegexpobjectsdoesn’tappearinthisbook,wewon’tseehowtouseit.Evenifyouuseitinextensionlibraries,aslongasyoudonotwanttouseitaveryparticularway,theinterfacefunctionsareenough.
structRHash
structRHashisthestructforHashobject,whichisRuby’shashtable.
▼structRHash
341structRHash{342structRBasicbasic;343structst_table*tbl;344intiter_lev;345VALUEifnone;346};
(ruby.h)
It’sawrapperforstructst_table.st_tablewillbedetailedinthenextchapter“Namesandnametables”.
ifnoneisthevaluewhenakeydoesnothaveanassociatedvalue,itsdefaultisnil.iter_levistomakethehashtablereentrant(multithreadsafe).
structRFile
structRFileisastructforinstancesofthebuilt-inIOclassanditssubclasses.
▼structRFile
348structRFile{349structRBasicbasic;350structOpenFile*fptr;351};
(ruby.h)
▼OpenFile
19typedefstructOpenFile{20FILE*f;/*stdioptrforread/write*/21FILE*f2;/*additionalptrforrwpipes*/22intmode;/*modeflags*/23intpid;/*child'spid(forpipes)*/24intlineno;/*numberoflinesread*/25char*path;/*pathnameforfile*/26void(*finalize)_((structOpenFile*));/*finalizeproc*/27}OpenFile;
(rubyio.h)
AllmembershavebeentransferredinstructOpenFile.Astherearen’tmanyinstancesofIOobjects,it’sOKtodoitlikethis.Thepurposeofeachmemberiswritteninthecomments.Basically,it’sawrapperaroundC’sstdio.
structRData
structRDatahasadifferenttenorfromwhatwesawbefore.Itisthestructforimplementationofextensionlibraries.
Ofcoursestructsforclassescreatedinextensionlibrariesarenecessary,butasthetypesofthesestructsdependonthecreatedclass,it’simpossibletoknowtheirsizeorstructinadvance.That’swhya“structformanagingapointertoauserdefinedstruct”hasbeencreatedonruby’ssidetomanagethis.ThisstructisstructRData.
▼structRData
353structRData{354structRBasicbasic;355void(*dmark)_((void*));356void(*dfree)_((void*));357void*data;358};
(ruby.h)
dataisapointertotheuserdefinedstruct,dfreeisthefunctionusedtofreethatuserdefinedstruct,anddmarkisthefunctiontodo“mark”ofthemarkandsweep.
BecauseexplainingstructRDataisstilltoocomplicated,forthetimebeinglet’sjustlookatitsrepresentation(figure8).Thedetailedexplanationofitsmemberswillbeintroducedafterwe’llfinishchapter5“Garbagecollection”.
Figure8:RepresentationofstructRData
TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera
CreativeCommonsAttribution-NonCommercial-ShareAlike2.5
License
RubyHackingGuide
TranslatedbyCliffordEscobarCAOILE
Chapter3:Namesand
NameTable
st_table
st_tablehasalreadyappearedseveraltimesasamethodtableandaninstancetable.Inthischapterlet’slookatthestructureofthest_tableindetail.
SummaryIpreviouslymentionedthatthest_tableisahashtable.Whatisahashtable?Itisadatastructurethatrecordsone-to-onerelations,forexample,avariablenameanditsvalue,orafunctionnameanditsbody,etc.
However,datastructuresotherthanhashtablescan,ofcourse,recordone-to-onerelations.Forexample,alistofthefollowingstructswillsufficeforthispurpose.
structentry{IDkey;VALUEval;
structentry*next;/*pointtothenextentry*/};
However,thismethodisslow.Ifthelistcontainsathousanditems,intheworstcase,itisnecessarytotraverseathousandlinks.Inotherwords,thesearchtimeincreasesinproportiontothenumberofelements.Thisisbad.Sinceancienttimes,variousspeedimprovementmethodshavebeenconceived.Thehashtableisoneofthoseimprovedmethods.Inotherwords,thepointisnotthatthehashtableisnecessarybutthatitcanbemadefaster.
Nowthen,letusexaminethest_table.Asitturnsout,thislibraryisnotcreatedbyMatsumoto,rather:
▼st.ccredits
1/*ThisisapublicdomaingeneralpurposehashtablepackagewrittenbyPeterMoore@UCB.*/
(st.c)
asshownabove.
Bytheway,whenIsearchedGoogleandfoundanotherversion,itmentionedthatst_tableisacontractionof“STringTABLE”.However,Ifinditcontradictorythatithasboth“generalpurpose”and“string”aspects.
Whatisahashtable?
Ahashtablecanbethoughtasthefollowing:Letusthinkofanarraywithnitems.Forexample,letusmaken=64(figure1).
Figure1:Array
Thenletusspecifyafunctionfthattakesakeyandproducesanintegerifrom0ton-1(0-63).Wecallthisfahashfunction.fwhengiventhesamekeyalwaysproducesthesamei.Forexample,ifwecanassumethatthekeyislimitedtopositiveintegers,whenthekeyisdividedby64,theremaindershouldalwaysfallbetween0and63.Therefore,thiscalculatingexpressionhasapossibilityofbeingthefunctionf.
Whenrecordingrelationships,givenakey,functionfgeneratesi,andplacesthevalueintoindexiofthearraywehaveprepared.Indexaccessintoanarrayisveryfast.Thekeyconcernischangingakeyintoaninteger.
Figure2:Arrayassignment
However,intherealworlditisn’tthateasy.Thereisacriticalproblemwiththisidea.Becausenisonly64,iftherearemorethan64relationshipstoberecorded,itiscertainthattherewillbethesameindexfortwodifferentkeys.Itisalsopossiblethatwithfewerthan64,thesamethingcanoccur.Forexample,giventheprevioushashfunction“key%64”,keys65and129willbothhaveahashvalueof1.Thisiscalledahashvaluecollision.Therearemanywaystoresolvesuchacollision.
Onesolutionistoinsertintothenextelementwhenacollisionoccurs.Thisiscalledopenaddressing.(Figure3).
Figure3:Openaddressing
Otherthanusingthearraylikethis,thereareotherpossibleapproaches,likeusingapointertoarespectivelinkedlistineachelementofthearray.Thenwhenacollisionoccurs,growthelinkedlist.Thisiscalledchaining.(Figure4)st_tableusesthischainingmethod.
Figure4:Chaining
However,ifitcanbedeterminedaprioriwhatsetofkeyswillbeused,itispossibletoimagineahashfunctionthatwillnevercreatecollisions.Thistypeoffunctioniscalleda“perfecthashfunction”.Actually,therearetoolswhichcreateaperfecthashfunctiongivenasetofarbitrarystrings.GNUgperfisoneofthose.ruby‘sparserimplementationusesGNUgperfbut…thisisnotthetimetodiscussit.We’lldiscussthisinthesecondpartofthebook.
DataStructureLetusstartlookingatthesourcecode.Aswrittenintheintroductorychapter,ifthereisdataandcode,itisbettertoreadthedatafirst.Thefollowingisthedatatypeofst_table.
▼st_table
9typedefstructst_tablest_table;
16structst_table{17structst_hash_type*type;18intnum_bins;/*slotcount*/
19intnum_entries;/*totalnumberofentries*/20structst_table_entry**bins;/*slot*/21};
(st.h)
▼structst_table_entry
16structst_table_entry{17unsignedinthash;18char*key;19char*record;20st_table_entry*next;21};
(st.c)
st_tableisthemaintablestructure.st_table_entryisaholderthatstoresonevalue.st_table_entrycontainsamembercallednextwhichofcourseisusedtomakest_table_entryintoalinkedlist.Thisisthechainpartofthechainingmethod.Thest_hash_typedatatypeisused,butIwillexplainthislater.Firstletmeexplaintheotherpartssoyoucancompareandunderstandtheroles.
Figure5:st_tabledatastructure
So,letuscommentonst_hash_type.
▼structst_hash_type
11structst_hash_type{12int(*compare)();/*comparisonfunction*/13int(*hash)();/*hashfunction*/14};
(st.h)
ThisisstillChapter3soletusexamineitattentively.
int(*compare)()
Thispartshows,ofcourse,themembercomparewhichhasadatatypeof“apointertoafunctionthatreturnsanint”.hashisalsoofthesametype.Thisvariableissubstitutedinthefollowingway:
intgreat_function(intn){/*ToDo:Dosomethinggreat!*/returnn;}
{int(*f)();f=great_function;
Anditiscalledlikethis:
(*f)(7);}
Hereletusreturntothest_hash_typecommentary.Ofthetwomembershashandcompare,hashisthehashfunctionfexplainedpreviously.
Ontheotherhand,compareisafunctionthatevaluatesifthekeyisactuallythesameornot.Withthechainingmethod,inthespotwiththesamehashvaluen,multipleelementscanbeinserted.Toknowexactlywhichelementisbeingsearchedfor,thistimeitisnecessarytouseacomparisonfunctionthatwecanabsolutelytrust.comparewillbethatfunction.
Thisst_hash_typeisagoodgeneralizedtechnique.Thehashtableitselfcannotdeterminewhatthestoredkeys’datatypewillbe.Forexample,inruby,st_table’skeysareIDorchar*orVALUE,buttowritethesamekindofhashforeach(datatype)isfoolish.Usually,thethingsthatchangewiththedifferentkeydatatypesarethingslikethehashfunction.Forthingslikememoryallocationandcollisiondetection,typicallymostofthecodeisthesame.Onlythepartswheretheimplementationchangeswithadifferingdatatypewillbebundledupintoafunction,andapointertothatfunctionwillbeused.Inthisfashion,themajorityofthecodethatmakesupthehashtableimplementationcanuseit.
Inobject-orientedlanguages,inthefirstplace,youcanattachaproceduretoanobjectandpassit(around),sothismechanismis
notnecessary.Perhapsitmorecorrecttosaythatthismechanismisbuilt-inasalanguage’sfeature.
st_hash_typeexampleTheusageofadatastructurelikest_hash_typeisgoodasanabstraction.Ontheotherhand,whatkindofcodeitactuallypassesthroughmaybedifficulttounderstand.Ifwedonotexaminewhatsortoffunctionisusedforhashorcompare,wewillnotgraspthereality.Tounderstandthis,itisprobablysufficienttolookatst_init_numtable()introducedinthepreviouschapter.Thisfunctioncreatesatableforintegerdatatypekeys.
▼st_init_numtable()
182st_table*183st_init_numtable()184{185returnst_init_table(&type_numhash);186}
(st.c)
st_init_table()isthefunctionthatallocatesthetablememoryandsoon.type_numhashisanst_hash_type(itisthemembernamed“type”ofst_table).Regardingthistype_numhash:
▼type_numhash
37staticstructst_hash_typetype_numhash={38numcmp,
39numhash,40};
552staticint553numcmp(x,y)554longx,y;555{556returnx!=y;557}
559staticint560numhash(n)561longn;562{563returnn;564}
(st.c)
Verysimple.Thetablethattherubyinterpreterusesisbyandlargethistype_numhash.
st_lookup()
Nowthen,letuslookatthefunctionthatusesthisdatastructure.First,it’sagoodideatolookatthefunctionthatdoesthesearching.Shownbelowisthefunctionthatsearchesthehashtable,st_lookup().
▼st_lookup()
247int248st_lookup(table,key,value)249st_table*table;250registerchar*key;251char**value;
252{253unsignedinthash_val,bin_pos;254registerst_table_entry*ptr;255256hash_val=do_hash(key,table);257FIND_ENTRY(table,ptr,hash_val,bin_pos);258259if(ptr==0){260return0;261}262else{263if(value!=0)*value=ptr->record;264return1;265}266}
(st.c)
Theimportantpartsareprettymuchindo_hash()andFIND_ENTRY().Letuslookattheminorder.
▼do_hash()
68#definedo_hash(key,table)(unsignedint)(*(table)->type->hash)((key))
(st.c)
Justincase,letuswritedownthemacrobodythatisdifficulttounderstand:
(table)->type->hash
isafunctionpointerwherethekeyispassedasaparameter.Thisisthesyntaxforcallingthefunction.*isnotappliedtotable.Inotherwords,thismacroisahashvaluegeneratorforakey,usingthe
preparedhashfunctiontype->hashforeachdatatype.
Next,letusexamineFIND_ENTRY().
▼FIND_ENTRY()
235#defineFIND_ENTRY(table,ptr,hash_val,bin_pos)do{\236bin_pos=hash_val%(table)->num_bins;\237ptr=(table)->bins[bin_pos];\238if(PTR_NOT_EQUAL(table,ptr,hash_val,key)){\239COLLISION;\240while(PTR_NOT_EQUAL(table,ptr->next,hash_val,key)){\241ptr=ptr->next;\242}\243ptr=ptr->next;\244}\245}while(0)
227#definePTR_NOT_EQUAL(table,ptr,hash_val,key)((ptr)!=0&&\(ptr->hash!=(hash_val)||!EQUAL((table),(key),(ptr)->key)))
66#defineEQUAL(table,x,y)\((x)==(y)||(*table->type->compare)((x),(y))==0)
(st.c)
COLLISIONisadebugmacrosowewill(should)ignoreit.
TheparametersofFIND_ENTRY(),startingfromtheleftare:
1. st_table2. thefoundentrywillbepointedtobythisparameter3. hashvalue4. temporaryvariable
And,thesecondparameterwillpointtothefoundst_table_entry*.
Attheoutermostlevel,ado..while(0)isusedtosafelywrapupamultipleexpressionmacro.Thisisruby‘s,orrather,Clanguage’spreprocessoridiom.Inthecaseofif(1),theremaybeadangerofaddinganelsepart.Inthecaseofwhile(1),itbecomesnecessarytoaddabreakattheveryend.
Also,thereisnosemicolonaddedafterthewhile(0).
FIND_ENTRY();
Thisissothatthesemicolonthatisnormallywrittenattheendofanexpressionwillnotgotowaste.
st_add_direct()
Continuingon,letusexaminest_add_direct()whichisafunctionthataddsanewrelationshiptothehashtable.Thisfunctiondoesnotcheckifthekeyisalreadyregistered.Italwaysaddsanewentry.Thisisthemeaningofdirectinthefunctionname.
▼st_add_direct()
308void309st_add_direct(table,key,value)310st_table*table;311char*key;312char*value;313{314unsignedinthash_val,bin_pos;
315316hash_val=do_hash(key,table);317bin_pos=hash_val%table->num_bins;318ADD_DIRECT(table,key,value,hash_val,bin_pos);319}
(st.c)
Justasbefore,thedo_hash()macrothatobtainsavalueiscalledhere.Afterthat,thenextcalculationisthesameasatthestartofFIND_ENTRY(),whichistoexchangethehashvalueforarealindex.
ThentheinsertionoperationseemstobeimplementedbyADD_DIRECT().Sincethenameisalluppercase,wecananticipatethatisamacro.
▼ADD_DIRECT()
268#defineADD_DIRECT(table,key,value,hash_val,bin_pos)\269do{\270st_table_entry*entry;\271if(table->num_entries/(table->num_bins)\>ST_DEFAULT_MAX_DENSITY){\272rehash(table);\273bin_pos=hash_val%table->num_bins;\274}\275\/*(A)*/\276entry=alloc(st_table_entry);\277\278entry->hash=hash_val;\279entry->key=key;\280entry->record=value;\/*(B)*/\281entry->next=table->bins[bin_pos];\282table->bins[bin_pos]=entry;\283table->num_entries++;\
284}while(0)
(st.c)
ThefirstifisanexceptioncasesoIwillexplainitafterwards.
(A)Allocateandinitializeast_table_entry.
(B)Inserttheentryintothestartofthelist.Thisistheidiomforhandlingthelist.Inotherwords,
entry->next=list_beg;list_beg=entry;
makesitpossibletoinsertanentrytothefrontofthelist.Thisissimilarto“cons-ing”intheLisplanguage.Checkforyourselfthateveniflist_begisNULL,thiscodeholdstrue.
Now,letmeexplainthecodeIleftaside.
▼ADD_DIRECT()-rehash
271if(table->num_entries/(table->num_bins)\>ST_DEFAULT_MAX_DENSITY){\272rehash(table);\273bin_pos=hash_val%table->num_bins;\274}\
(st.c)
DENSITYis“concentration”.Inotherwords,thisconditionalchecksifthehashtableis“crowded”ornot.Inthest_table,asthenumber
ofvaluesthatusethesamebin_posincreases,thelongerthelinklistbecomes.Inotherwords,searchbecomesslower.Thatiswhyforagivenbincount,whentheaverageelementsperbinbecometoomany,binisincreasedandthecrowdingisreduced.
ThecurrentST_DEFAULT_MAX_DENSITYis
▼ST_DEFAULT_MAX_DENSITY
23#defineST_DEFAULT_MAX_DENSITY5
(st.c)
Becauseofthissetting,ifinallbin_posthereare5st_table_entries,thenthesizewillbeincreased.
st_insert()
st_insert()isnothingmorethanacombinationofst_add_direct()andst_lookup(),soifyouunderstandthosetwo,thiswillbeeasy.
▼st_insert()
286int287st_insert(table,key,value)288registerst_table*table;289registerchar*key;290char*value;291{292unsignedinthash_val,bin_pos;293registerst_table_entry*ptr;294295hash_val=do_hash(key,table);
296FIND_ENTRY(table,ptr,hash_val,bin_pos);297298if(ptr==0){299ADD_DIRECT(table,key,value,hash_val,bin_pos);300return0;301}302else{303ptr->record=value;304return1;305}306}
(st.c)
Itchecksiftheelementisalreadyregisteredinthetable.Onlywhenitisnotregisteredwillitbeadded.Ifthereisainsertion,return0.Ifthereisnoinsertion,returna1.
IDandSymbols
I’vealreadydiscussedwhatanIDis.Itisacorrespondencebetweenanarbitrarystringofcharactersandavalue.Itisusedtodeclarevariousnames.Theactualdatatypeisunsignedint.
Fromchar*toIDTheconversionfromstringtoIDisexecutedbyrb_intern().Thisfunctionisratherlong,solet’somitthemiddle.
▼rb_intern()(simplified)
5451staticst_table*sym_tbl;/*char*toID*/5452staticst_table*sym_rev_tbl;/*IDtochar**/
5469ID5470rb_intern(name)5471constchar*name;5472{5473constchar*m=name;5474IDid;5475intlast;5476/*Ifforaname,thereisacorrespondingIDthatisalreadyregistered,thenreturnthatID*/5477if(st_lookup(sym_tbl,name,&id))5478returnid;
/*omitted...createanewID*/
/*registerthenameandIDrelation*/5538id_regist:5539name=strdup(name);5540st_add_direct(sym_tbl,name,id);5541st_add_direct(sym_rev_tbl,id,name);5542returnid;5543}
(parse.y)
ThestringandIDcorrespondencerelationshipcanbeaccomplishedbyusingthest_table.Thereprobablyisn’tanyespeciallydifficultparthere.
Whatistheomittedsectiondoing?Itistreatingglobalvariablenamesandinstancevariablesnamesasspecialandflaggingthem.Thisisbecauseintheparser,itisnecessarytoknowthevariable’sclassificationfromtheID.However,thefundamentalpartofIDisunrelatedtothis,soIwon’texplainithere.
FromIDtochar*Thereverseofrb_intern()isrb_id2name(),whichtakesanIDandgeneratesachar*.Youprobablyknowthis,butthe2inid2nameis“to”.“To”and“two”havethesamepronounciation,so“2”isusedfor“to”.Thissyntaxisoftenseen.
ThisfunctionalsosetstheIDclassificationflagssoitislong.Letmesimplifyit.
▼rb_id2name()(simplified)
char*rb_id2name(id)IDid;{char*name;
if(st_lookup(sym_rev_tbl,id,&name))returnname;return0;}
Maybeitseemsthatitisalittleover-simplified,butinrealityifweremovethedetailsitreallybecomesthissimple.
ThepointIwanttoemphasizeisthatthefoundnameisnotcopied.TherubyAPIdoesnotrequire(orrather,itforbids)thefree()-ingofthereturnvalue.Also,whenparametersarepassed,italwayscopiesthem.Inotherwords,thecreationandreleaseiscompletedbyoneside,eitherbytheuserorbyruby.
Sothen,whencreationandreleasecannotbeaccomplished(whenpasseditisnotreturned)onavalue,thenaRubyobjectisused.Ihavenotyetdiscussedit,butaRubyobjectisautomaticallyreleasedwhenitisnolongerneeded,evenifwearenottakingcareoftheobject.
ConvertingVALUEandIDIDisshownasaninstanceoftheSymbolclassattheRubylevel.Anditcanbeobtainedlikeso:"string".intern.TheimplementationofString#internisrb_str_intern().
▼rb_str_intern()
2996staticVALUE2997rb_str_intern(str)2998VALUEstr;2999{3000IDid;30013002if(!RSTRING(str)->ptr||RSTRING(str)->len==0){3003rb_raise(rb_eArgError,"interningemptystring");3004}3005if(strlen(RSTRING(str)->ptr)!=RSTRING(str)->len)3006rb_raise(rb_eArgError,"stringcontains`\\0'");3007id=rb_intern(RSTRING(str)->ptr);3008returnID2SYM(id);3009}
(string.c)
Thisfunctionisquitereasonableasarubyclasslibrarycodeexample.PleasepayattentiontothepartwhereRSTRING()isused
andcasted,andwherethedatastructure’smemberisaccessed.
Let’sreadthecode.First,rb_raise()ismerelyerrorhandlingsoweignoreitfornow.Therb_intern()wepreviouslyexaminedishere,andalsoID2SYMishere.ID2SYM()isamacrothatconvertsIDtoSymbol.
AndthereverseoperationisaccomplishedusingSymbol#to_sandsuch.Theimplementationisinsym_to_s.
▼sym_to_s()
522staticVALUE523sym_to_s(sym)524VALUEsym;525{526returnrb_str_new2(rb_id2name(SYM2ID(sym)));527}
(object.c)
SYM2ID()isthemacrothatconvertsSymbol(VALUE)toanID.
Itlookslikethefunctionisnotdoinganythingunreasonable.However,itisprobablynecessarytopayattentiontotheareaaroundthememoryhandling.rb_id2name()returnsachar*thatmustnotbefree().rb_str_new2()copiestheparameter’schar*andusesthecopy(anddoesnotchangetheparameter).Inthiswaythepolicyisconsistent,whichallowsthelinetobewrittenjustbychainingthefunctions.
TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera
CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License
RubyHackingGuide
TranslatedbyVincentISAMBART
Chapter4:Classesand
modules
Inthischapter,we’llseethedetailsofthedatastructurescreatedbyclassesandmodules.
Classesandmethodsdefinition
First,I’dliketohavealookathowRubyclassesaredefinedattheClevel.Thischapterinvestigatesalmostonlyparticularcases,soI’dlikeyoutoknowfirstthewayusedmostoften.
ThemainAPItodefineclassesandmodulesconsistsofthefollowing6functions:
rb_define_class()
rb_define_class_under()
rb_define_module()
rb_define_module_under()
rb_define_method()
rb_define_singleton_method()
Thereareafewotherversionsofthesefunctions,buttheextensionlibrariesandevenmostofthecorelibraryisdefinedusingjustthisAPI.I’llintroducetoyouthesefunctionsonebyone.
Classdefinitionrb_define_class()definesaclassatthetop-level.Let’staketheRubyarrayclass,Array,asanexample.
▼Arrayclassdefinition
19VALUErb_cArray;
1809void1810Init_Array()1811{1812rb_cArray=rb_define_class("Array",rb_cObject);
(array.c)
rb_cObjectandrb_cArraycorrespondrespectivelytoObjectandArrayattheRubylevel.Theaddedprefixrbshowsthatitbelongstorubyandthecthatitisaclassobject.Thesenamingrulesareusedeverywhereinruby.
Thiscalltorb_define_class()definesaclasscalledArray,whichinheritsfromObject.Atthesametimeasrb_define_class()createstheclassobject,italsodefinestheconstant.ThatmeansthatafterthisyoucanalreadyaccessArrayfromaRubyprogram.ItcorrespondstothefollowingRubyprogram:
classArray<Object
I’dlikeyoutonotethefactthatthereisnoend.Itwaswrittenlikethisonpurpose.Itisbecausewithrb_define_class()thebodyoftheclasshasnotbeenexecuted.
NestedclassdefinitionAfterthat,there’srb_define_class_under().Thisfunctiondefinesaclassnestedinanotherclassormodule.Thistimetheexampleiswhatisreturnedbystat(2),File::Stat.
▼DefinitionofFile::Stat
78VALUErb_cFile;80staticVALUErb_cStat;
2581rb_cFile=rb_define_class("File",rb_cIO);2674rb_cStat=rb_define_class_under(rb_cFile,"Stat",rb_cObject);
(file.c)
ThiscodecorrespondstothefollowingRubyprogram;
classFile<IOclassStat<Object
ThistimeagainIomittedtheendonpurpose.
Moduledefinition
rb_define_module()issimplesolet’sendthisquickly.
▼DefinitionofEnumerable
17VALUErb_mEnumerable;
492rb_mEnumerable=rb_define_module("Enumerable");
(enum.c)
Theminthebeginningofrb_mEnumerableissimilartothecforclasses:itshowsthatitisamodule.ThecorrespondingRubyprogramis:
moduleEnumerable
rb_define_module_under()isnotusedmuchsowe’llskipit.
MethoddefinitionThistimethefunctionistheonefordefiningmethods,rb_define_method().It’susedveryoften.We’lltakeonceagainanexamplefromArray.
▼DefinitionofArray#to_s
1818rb_define_method(rb_cArray,"to_s",rb_ary_to_s,0);
(array.c)
Withthistheto_smethodisdefinedinArray.Themethodbodyis
givenbyafunctionpointer(rb_ary_to_s).Thefourthparameteristhenumberofparameterstakenbythemethod.Asto_sdoesnottakeanyparameters,it’s0.IfwewritethecorrespondingRubyprogram,we’llhavethis:
classArray<Objectdefto_s#contentofrb_ary_to_s()endend
Ofcoursetheclasspartisnotincludedinrb_define_method()andonlythedefpartisaccurate.Butifthereisnoclasspart,itwilllooklikethemethodisdefinedlikeafunction,soIalsowrotetheenclosingclasspart.
Onemoreexample,thistimetakingaparameter:
▼DefinitionofArray#concat
1835rb_define_method(rb_cArray,"concat",rb_ary_concat,1);
(array.c)
Theclassforthedefinitionisrb_cArray(Array),themethodnameisconcat,itsbodyisrb_ary_concat()andthenumberofparametersis1.ItcorrespondstowritingthecorrespondingRubyprogram:
classArray<Objectdefconcat(str)#contentofrb_ary_concat()end
end
SingletonmethodsdefinitionWecandefinemethodsthatarespecifictoasingleobjectinstance.Theyarecalledsingletonmethods.AsIusedFile.unlinkasanexampleinchapter1“Rubylanguageminimum”,Ifirstwantedtoshowithere,butforaparticularreasonwe’lllookatFile.linkinstead.
▼DefinitionofFile.link
2624rb_define_singleton_method(rb_cFile,"link",rb_file_s_link,2);
(file.c)
It’susedlikerb_define_method().Theonlydifferenceisthatherethefirstparameterisjustthe“object”wherethemethodisdefined.Inthiscase,it’sdefinedinrb_cFile.
EntrypointBeingabletomakedefinitionslikebeforeisgreat,butwherearethesefunctionscalledfrom,andbywhatmeansaretheyexecuted?ThesedefinitionsaregroupedinfunctionsnamedInit_xxxx().Forinstance,forArrayafunctionInit_Array()likethishasbeenmade:
▼Init_Array
1809void1810Init_Array()1811{1812rb_cArray=rb_define_class("Array",rb_cObject);1813rb_include_module(rb_cArray,rb_mEnumerable);18141815rb_define_singleton_method(rb_cArray,"allocate",rb_ary_s_alloc,0);1816rb_define_singleton_method(rb_cArray,"[]",rb_ary_s_create,-1);1817rb_define_method(rb_cArray,"initialize",rb_ary_initialize,-1);1818rb_define_method(rb_cArray,"to_s",rb_ary_to_s,0);1819rb_define_method(rb_cArray,"inspect",rb_ary_inspect,0);1820rb_define_method(rb_cArray,"to_a",rb_ary_to_a,0);1821rb_define_method(rb_cArray,"to_ary",rb_ary_to_a,0);1822rb_define_method(rb_cArray,"frozen?",rb_ary_frozen_p,0);
(array.c)
TheInitforthebuilt-infunctionsareexplicitlycalledduringthestartupofruby.Thisisdoneininits.c.
▼rb_call_inits()
47void48rb_call_inits()49{50Init_sym();51Init_var_tables();52Init_Object();53Init_Comparable();54Init_Enumerable();55Init_Precision();56Init_eval();57Init_String();58Init_Exception();59Init_Thread();60Init_Numeric();61Init_Bignum();62Init_Array();
(inits.c)
Thisway,Init_Array()iscalledproperly.
Thatexplainsitforthebuilt-inlibraries,butwhataboutextensionlibraries?Infact,forextensionlibrariestheconventionisthesame.Takethefollowingcode:
require"myextension"
Withthis,iftheloadedextensionlibraryismyextension.so,atloadtime,the(extern)functionnamedInit_myextension()iscalled.Howtheyarecalledisbeyondthescopeofthischapter.Forthat,youshouldreadchapter18,“Load”.Herewe’lljustendthiswithanexampleofInit.
Thefollowingexampleisfromstringio,anextensionlibraryprovidedwithruby,thatistosaynotfromabuilt-inlibrary.
▼Init_stringio()(beginning)
895void896Init_stringio()897{898VALUEStringIO=rb_define_class("StringIO",rb_cData);899rb_define_singleton_method(StringIO,"allocate",strio_s_allocate,0);900rb_define_singleton_method(StringIO,"open",strio_s_open,-1);901rb_define_method(StringIO,"initialize",strio_initialize,-1);902rb_enable_super(StringIO,"initialize");903rb_define_method(StringIO,"become",strio_become,1);904rb_define_method(StringIO,"reopen",strio_reopen,-1);
(ext/stringio/stringio.c)
Singletonclasses
rb_define_singleton_method()
Youshouldnowbeabletomoreorlessunderstandhownormalmethodsaredefined.Somehowmakingthebodyofthemethod,thenregisteringitinm_tblwilldo.Butwhataboutsingletonmethods?We’llnowlookintothewaysingletonmethodsaredefined.
▼rb_define_singleton_method()
721void722rb_define_singleton_method(obj,name,func,argc)723VALUEobj;724constchar*name;725VALUE(*func)();726intargc;727{728rb_define_method(rb_singleton_class(obj),name,func,argc);729}
(class.c)
AsIexplained,rb_define_method()isafunctionusedtodefinenormalmethods,sothedifferencefromnormalmethodsisonlyrb_singleton_class().Butwhatoneartharesingletonclasses?
Inbrief,singletonclassesarevirtualclassesthatareonlyusedtoexecutesingletonmethods.Singletonmethodsarefunctionsdefinedinsingletonclasses.Classesthemselvesareinthefirstplace(inaway)the“implementation”tolinkobjectsandmethods,butsingletonclassesareevenmoreontheimplementationside.IntheRubylanguageway,theyarenotformallyincluded,anddon’tappearmuchattheRubylevel.
rb_singleton_class()
Well,let’sconfirmwhatthesingletonclassesaremadeof.It’stoosimpletojustshowyouthecodeofafunctioneachtimesothistimeI’lluseanewweapon,acallgraph.
rb_define_singleton_methodrb_define_methodrb_singleton_classSPECIAL_SINGLETONrb_make_metaclassrb_class_bootrb_singleton_class_attached
Callgraphsaregraphsshowingcallingrelationshipsamongfunctions(ormoregenerallyprocedures).Thecallgraphsshowingallthecallswritteninthesourcecodearecalledstaticcallgraphs.Theonesexpressingonlythecallsdoneduringanexecutionarecalleddynamiccallgraphs.
Thisdiagramisastaticcallgraphandtheindentationexpresseswhichfunctioncallswhichone.Forinstance,rb_define_singleton_method()callsrb_define_method()and
rb_singleton_class().Andthisrb_singleton_class()itselfcallsSPECIAL_SINGLETON()andrb_make_metaclass().Inordertoobtaincallgraphs,youcanusecflowandsuch.{cflow:seealsodoc/callgraph.htmlintheattachedCD-ROM}
Inthisbook,becauseIwantedtoobtaincallgraphsthatcontainonlyfunctions,Icreatedaruby-specifictoolbymyself.Perhapsitcanbegeneralizedbymodifyingitscodeanalyzingpart,thusI’dliketosomehowmakeituntilaroundthepublicationofthisbook.Thesesituationsarealsoexplainedindoc/callgraph.htmloftheattachedCD-ROM.
Let’sgobacktothecode.Whenlookingatthecallgraph,youcanseethatthecallsmadebyrb_singleton_class()goverydeep.Untilnowallcalllevelswereshallow,sowecouldsimplylookatthefunctionswithoutgettingtoolost.Butatthisdepth,IeasilyforgetwhatIwasdoing.Insuchsituationyoumustbringacallgraphtokeepawareofwhereitiswhenreading.Thistime,asanexample,we’lldecodetheproceduresbelowrb_singleton_class()inparallel.Weshouldlookoutforthefollowingtwopoints:
Whatexactlyaresingletonclasses?Whatisthepurposeofsingletonclasses?
NormalclassesandsingletonclassesSingletonclassesarespecialclasses:they’rebasicallythesameasnormalclasses,butthereareafewdifferences.Wecansaythat
findingthesedifferencesisexplainingconcretelysingletonclasses.
Whatshouldwedotofindthem?Weshouldfindthedifferencesbetweenthefunctioncreatingnormalclassesandtheonecreatingsingletonclasses.Forthis,wehavetofindthefunctionforcreatingnormalclasses.Thatisasnormalclassescanbedefinedbyrb_define_class(),itmustcallinawayoranotherafunctiontocreatenormalclasses.Forthemoment,we’llnotlookatthecontentofrb_define_class()itself.Ihavesomereasonstobeinterestedinsomethingthat’sdeeper.That’swhywewillfirstlookatthecallgraphofrb_define_class().
rb_define_classrb_class_inheritedrb_define_class_idrb_class_newrb_class_bootrb_make_metaclassrb_class_bootrb_singleton_class_attached
I’minterestedbyrb_class_new().Doesn’tthisnamemeansitcreatesanewclass?Let’sconfirmthat.
▼rb_class_new()
37VALUE38rb_class_new(super)39VALUEsuper;40{41Check_Type(super,T_CLASS);42if(super==rb_cClass){43rb_raise(rb_eTypeError,"can'tmakesubclassofClass");
44}45if(FL_TEST(super,FL_SINGLETON)){46rb_raise(rb_eTypeError,"can'tmakesubclassofvirtualclass");47}48returnrb_class_boot(super);49}
(class.c)
Check_Type()ischecksthetypeofobjectstructure,sowecanignoreit.rb_raise()iserrorhandlingsowecanignoreit.Onlyrb_class_boot()remains.Solet’slookatit.
▼rb_class_boot()
21VALUE22rb_class_boot(super)23VALUEsuper;24{25NEWOBJ(klass,structRClass);/*allocatesstructRClass*/26OBJSETUP(klass,rb_cClass,T_CLASS);/*initializationoftheRBasicpart*/2728klass->super=super;/*(A)*/29klass->iv_tbl=0;30klass->m_tbl=0;31klass->m_tbl=st_init_numtable();3233OBJ_INFECT(klass,super);34return(VALUE)klass;35}
(class.c)
NEWOBJ()andOBJSETUP()arefixedexpressionsusedwhencreatingRubyobjectsthatpossessoneofthebuilt-instructuretypes(structRxxxx).Theyarebothmacros.InNEWOBJ(),structRClassiscreated
andthepointerisputinitsfirstparameterklass.InOBJSETUP(),thestructRBasicmemberoftheRClass(andthusbasic.klassandbasic.flags)isinitialized.
OBJ_INFECT()isamacrorelatedtosecurity.Fromnowon,we’llignoreit.
At(A),thesupermemberofklassissettothesuperparameter.Itlookslikerb_class_boot()isafunctionthatcreatesaclassinheritingfromsuper.
So,asrb_class_boot()isafunctionthatcreatesaclass,andrb_class_new()isalmostidentical.
Then,let’soncemorelookatrb_singleton_class()’scallgraph:
rb_singleton_classSPECIAL_SINGLETONrb_make_metaclassrb_class_bootrb_singleton_class_attached
Herealsorb_class_boot()iscalled.Souptothatpoint,it’sthesameasinnormalclasses.What’sgoingonafteriswhat’sdifferentbetweennormalclassesandsingletonclasses,inotherwordsthecharacteristicsofsingletonclasses.Ifeverything’sclearsofar,wejustneedtoreadrb_singleton_class()andrb_make_metaclass().
Compressedrb_singleton_class()
rb_singleton_class()isalittlelongsowe’llfirstremoveitsnon-essentialparts.
▼rb_singleton_class()
678#defineSPECIAL_SINGLETON(x,c)do{\679if(obj==(x)){\680returnc;\681}\682}while(0)
684VALUE685rb_singleton_class(obj)686VALUEobj;687{688VALUEklass;689690if(FIXNUM_P(obj)||SYMBOL_P(obj)){691rb_raise(rb_eTypeError,"can'tdefinesingleton");692}693if(rb_special_const_p(obj)){694SPECIAL_SINGLETON(Qnil,rb_cNilClass);695SPECIAL_SINGLETON(Qfalse,rb_cFalseClass);696SPECIAL_SINGLETON(Qtrue,rb_cTrueClass);697rb_bug("unknownimmediate%ld",obj);698}699700DEFER_INTS;701if(FL_TEST(RBASIC(obj)->klass,FL_SINGLETON)&&702(BUILTIN_TYPE(obj)==T_CLASS||703rb_iv_get(RBASIC(obj)->klass,"__attached__")==obj)){704klass=RBASIC(obj)->klass;705}706else{707klass=rb_make_metaclass(obj,RBASIC(obj)->klass);708}709if(OBJ_TAINTED(obj)){710OBJ_TAINT(klass);711}712else{
713FL_UNSET(klass,FL_TAINT);714}715if(OBJ_FROZEN(obj))OBJ_FREEZE(klass);716ALLOW_INTS;717718returnklass;719}
(class.c)
Thefirstandthesecondhalfareseparatedbyablankline.Thefirsthalfhandlesspecialcasesandthesecondhalfhandlesthegeneralcase.Inotherwords,thesecondhalfisthetrunkofthefunction.That’swhywe’llkeepitforlaterandtalkaboutthefirsthalf.
Everythingthatishandledinthefirsthalfarenon-pointerVALUEs,itmeanstheirobjectstructsdonotexist.First,FixnumandSymbolareexplicitlypicked.Then,rb_special_const_p()isafunctionthatreturnstruefornon-pointerVALUEs,sothereonlyQtrue,QfalseandQnilshouldgetcaught.Otherthanthat,therearenovalidnon-pointerVALUEsoitwouldbereportedasabugwithrb_bug().
DEFER_INTS()andALLOW_INTS()bothendwiththesameINTSsoyoushouldseeapairinthem.That’sthecase,andtheyaremacrosrelatedtosignals.Becausetheyaredefinedinrubysig.h,youcanguessthatINTSistheabbreviationofinterrupts.Youcanignorethem.
Compressedrb_make_metaclass()▼rb_make_metaclass()
142VALUE143rb_make_metaclass(obj,super)144VALUEobj,super;145{146VALUEklass=rb_class_boot(super);147FL_SET(klass,FL_SINGLETON);148RBASIC(obj)->klass=klass;149rb_singleton_class_attached(klass,obj);150if(BUILTIN_TYPE(obj)==T_CLASS){151RBASIC(klass)->klass=klass;152if(FL_TEST(obj,FL_SINGLETON)){153RCLASS(klass)->super=RBASIC(rb_class_real(RCLASS(obj)->super))->klass;154}155}156157returnklass;158}
(class.c)
Wealreadysawrb_class_boot().Itcreatesa(normal)classusingthesuperparameterasitssuperclass.Afterthat,theFL_SINGLETONofthisclassisset.Thisisclearlysuspicious.Thenameofthefunctionmakesusthinkthatitistheindicationofasingletonclass.
Whataresingletonclasses?Finishingtheaboveprocess,furthermore,we’llthroughawaythedeclarationsbecauseparameters,returnvaluesandlocalvariablesareallVALUE.Thatmakesusabletocompresstothefollowing:
▼rb_singleton_class()rb_make_metaclass()(aftercompression)
rb_singleton_class(obj)
{if(FL_TEST(RBASIC(obj)->klass,FL_SINGLETON)&&(BUILTIN_TYPE(obj)==T_CLASS||BUILTIN_TYPE(obj)==T_MODULE)&&rb_iv_get(RBASIC(obj)->klass,"__attached__")==obj){klass=RBASIC(obj)->klass;}else{klass=rb_make_metaclass(obj,RBASIC(obj)->klass);}returnklass;}
rb_make_metaclass(obj,super){klass=createaclasswithsuperassuperclass;FL_SET(klass,FL_SINGLETON);RBASIC(obj)->klass=klass;rb_singleton_class_attached(klass,obj);if(BUILTIN_TYPE(obj)==T_CLASS){RBASIC(klass)->klass=klass;if(FL_TEST(obj,FL_SINGLETON)){RCLASS(klass)->super=RBASIC(rb_class_real(RCLASS(obj)->super))->klass;}}
returnklass;}
Theconditionoftheifstatementofrb_singleton_class()seemsquitecomplicated.However,thisconditionisnotconnectedtorb_make_metaclass(),whichisthemainstream,sowe’llseeitlater.Let’sfirstthinkaboutwhathappensonthefalsebranchoftheif.
TheBUILTIN_TYPE()ofrb_make_metaclass()issimilartoTYPE()asitisamacrotogetthestructuretypeflag(T_xxxx).Thatmeansthischeckinrb_make_metaclassmeans“ifobjisaclass”.Forthemoment
weassumethatobjisaclass,sowe’llremoveit.
Withthesesimplifications,wegetthefollowing:
▼rb_singleton_class()rb_make_metaclass()(afterrecompression)
rb_singleton_class(obj){klass=createaclasswithRBASIC(obj)->klassassuperclass;FL_SET(klass,FL_SINGLETON);RBASIC(obj)->klass=klass;returnklass;}
Butthereisstillaquitehardtounderstandsidetoit.That’sbecauseklassisusedtoooften.Solet’srenametheklassvariabletosclass.
▼rb_singleton_class()rb_make_metaclass()(variablesubstitution)
rb_singleton_class(obj){sclass=createaclasswithRBASIC(obj)->klassassuperclass;FL_SET(sclass,FL_SINGLETON);RBASIC(obj)->klass=sclass;returnsclass;}
Nowitshouldbeveryeasytounderstand.Tomakeitevensimpler,I’verepresentedwhatisdonewithadiagram(figure1).Inthehorizontaldirectionisthe“instance–class”relation,andintheverticaldirectionisinheritance(thesuperclassesareabove).
Figure1:rb_singleton_class
Whencomparingthefirstandlastpartofthisdiagram,youcanunderstandthatsclassisinsertedwithoutchangingthestructure.That’sallthereistosingletonclasses.Inotherwordstheinheritanceisincreasedonestep.Bydefiningmethodsthere,wecandefinemethodswhichhavecompletelynothingtodowithotherinstancesofklass.
SingletonclassesandinstancesBytheway,didyounoticeabout,duringthecompressionprocess,thecalltorb_singleton_class_attached()wasstealthilyremoved?Here:
rb_make_metaclass(obj,super){klass=createaclasswithsuperassuperclass;FL_SET(klass,FL_SINGLETON);RBASIC(obj)->klass=klass;rb_singleton_class_attached(klass,obj);/*THIS*/
Let’shavealookatwhatitdoes.
▼rb_singleton_class_attached()
130void131rb_singleton_class_attached(klass,obj)132VALUEklass,obj;133{134if(FL_TEST(klass,FL_SINGLETON)){135if(!RCLASS(klass)->iv_tbl){136RCLASS(klass)->iv_tbl=st_init_numtable();137}138st_insert(RCLASS(klass)->iv_tbl,rb_intern("__attached__"),obj);139}140}
(class.c)
IftheFL_SINGLETONflagofklassisset…inotherwordsifit’sasingletonclass,putthe__attached__→objrelationintheinstancevariabletableofklass(iv_tbl).That’showitlookslike(inourcaseklassisalwaysasingletonclass…inotherwordsitsFL_SINGLETONflagisalwaysset).
__attached__doesnothavethe@prefix,butit’sstoredintheinstancevariablestablesoit’sstillaninstancevariable.SuchaninstancevariablecanneverbereadattheRubylevelsoitcanbeusedtokeepvaluesforthesystem’sexclusiveuse.
Let’snowthinkabouttherelationshipbetweenklassandobj.klassisthesingletonclassofobj.Inotherwords,this“invisible”instancevariableallowsthesingletonclasstorememberthe
instanceitwascreatedfrom.Itsvalueisusedwhenthesingletonclassischanged,notablytocallhookmethodsontheinstance(i.e.obj).Forexample,whenamethodisaddedtoasingletonclass,theobj‘ssingleton_method_addedmethodiscalled.Thereisnologicalnecessitytodoingit,itwasdonebecausethat’showitwasdefinedinthelanguage.
Butisitreallyallright?Storingtheinstancein__attached__willforceonesingletonclasstohaveonlyoneattachedinstance.Forexample,bygetting(insomewayoranother)thesingletonclassandcallingnewonit,won’tasingletonclassenduphavingmultipleinstances?
Thiscannotbedonebecausetheproperchecksaredonetopreventthecreationofaninstanceofasingletonclass.
Singletonclassesareinthefirstplaceforsingletonmethods.Singletonmethodsaremethodsexistingonlyonaparticularobject.Ifsingletonclassescouldhavemultipleinstances,theywouldbethesameasnormalclasses.Hence,eachsingletonclasshasonlyoneinstance…orrather,itmustbelimitedtoone.
SummaryWe’vedonealot,maybemadearealmayhem,solet’sfinishandputeverythinginorderwithasummary.
Whataresingletonclasses?TheyareclassesthathavetheFL_SINGLETONflagsetandthatcanonlyhaveoneinstance.
Whataresingletonmethods?Theyaremethodsdefinedinthesingletonclassofanobject.
Metaclasses
Inheritanceofsingletonmethods
InfinitechainofclassesEvenaclasshasaclass,andit’sClass.AndtheclassofClassisagainClass.Wefindourselvesinaninfiniteloop(figure2).
Figure2:Infiniteloopofclasses
Uptohereit’ssomethingwe’vealreadygonethrough.What’sgoingafterthatisthethemeofthischapter.Whydoclasseshavetomakealoop?
First,inRubyalldataareobjects.AndclassesaredatainRubysotheyhavetobeobjects.
Astheyareobjects,theymustanswertomethods.Andsettingtherule“toanswertomethodsyoumustbelongtoaclass”made
processingeasier.That’swherecomestheneedforaclasstoalsohaveaclass.
Let’sbaseourselvesonthisandthinkaboutthewaytoimplementit.First,wecantryfirstwiththemostnaïveway,Class‘sclassisClassClass,ClassClass’sclassisClassClassClass…,chainingclassesofclassesonebyone.Butwhicheverthewayyoulookatit,thiscan’tbeimplementedeffectively.That’swhyit’scommoninobjectorientedlanguageswhereclassesareobjectsthatClass’sclassistoClassitself,creatinganendlessvirtualinstance-classrelationship.
((errata:ThisstructureisimplementedefficientlyinrecentRuby1.8,thusitcanbeimplementedefficiently.))
I’mrepeatingmyself,butthefactthatClass‘sclassisClassisonlytomaketheimplementationeasier,there’snothingimportantinthislogic.
“Classisalsoanobject”“Everythingisanobject”isoftenusedasadvertisingstatementwhenspeakingaboutRuby.Andasapartofthat,“Classesarealsoobjects!”alsoappears.Buttheseexpressionsoftengotoofar.Whenthinkingaboutthesesayings,wehavetosplitthemintwo:
alldataareobjectsclassesaredata
Talkingaboutdataorcodemakesadiscussionmuchhardertounderstand.That’swhyherewe’llrestrictthemeaningof“data”to“whatcanbeputinvariablesinprograms”.
Beingabletomanipulateclassesfromprogramsgivesprogramstheabilitytomanipulatethemselves.Thisiscalledreflection.InRuby,whichisaobjectorientedlanguageandfurthermorehasclasses,itisequivalenttobeabletodirectlymanipulateclasses.
Nevertheless,there’salsoawayinwhichclassesarenotobjects.Forexample,there’snoprobleminprovidingafeaturetomanipulateclassesasfunction-stylemethods(functionsdefinedatthetop-level).However,asinsidetheinterpretertherearedatastructurestorepresenttheclasses,it’smorenaturalinobjectorientedlanguagestomakethemavailabledirectly.AndRubydidthischoice.
Furthermore,anobjectiveinRubyisforalldatatobeobjects.That’swhyit’sappropriatetomakethemobjects.
Bytheway,thereisalsoareasonnotlinkedtoreflectionwhyinRubyclasseshadtobemadeobjects.Thatistopreparetheplacetodefinemethodswhichareindependentfrominstances(whatarecalledstaticmethodsinJavaandC++).
Andtoimplementstaticmethods,anotherthingwasnecessary:singletonmethods.Bychainreaction,thatalsomakessingletonclassesnecessary.Figure3showsthesedependencyrelationships.
Figure3:Requirementsdependencies
ClassmethodsinheritanceInRuby,singletonmethodsdefinedinaclassarecalledclassmethods.However,theirspecificationisalittlestrange.Forsomereasons,classmethodsareinheritable.
classAdefA.test#definesasingletonmethodinAputs("ok")endend
classB<Aend
B.test()#callsit
Thiscan’toccurwithsingletonmethodsfromobjectsthatarenotclasses.Inotherwords,classesaretheonlyoneshandledspecially.Inthefollowingsectionwe’llseehowclassmethodsareinherited.
SingletonclassofaclassAssumingthatclassmethodsareinherited,whereisthisoperationdone?Itmustbedoneeitheratclassdefinition(creation)oratsingletonmethoddefinition.Thenlet’sfirstlookatthecodedefiningclasses.
Classdefinitionmeansofcourserb_define_class().Nowlet’stakethecallgraphofthisfunction.
rb_define_classrb_class_inheritedrb_define_class_idrb_class_newrb_class_bootrb_make_metaclassrb_class_bootrb_singleton_class_attached
Ifyou’rewonderingwhereyou’veseenitbefore,welookedatitintheprevioussection.Atthattimeyoudidnotseeitbutifyoulookclosely,somehowrb_make_metaclass()appeared.Aswesawbefore,thisfunctionintroducesasingletonclass.Thisisverysuspicious.Whyisthiscalledevenifwearenotdefiningasingletonfunction?Furthermore,whyisthelowerlevelrb_make_metaclass()usedinsteadofrb_singleton_class()?Itlookslikewehavetocheckthesesurroundingsagain.
rb_define_class_id()
Let’sfirststartourreadingwithitscaller,rb_define_class_id().
▼rb_define_class_id()
160VALUE161rb_define_class_id(id,super)162IDid;163VALUEsuper;164{165VALUEklass;166167if(!super)super=rb_cObject;168klass=rb_class_new(super);169rb_name_class(klass,id);170rb_make_metaclass(klass,RBASIC(super)->klass);171172returnklass;173}
(class.c)
rb_class_new()wasafunctionthatcreatesaclasswithsuperasitssuperclass.rb_name_class()‘snamemeansitnamesaclass,butforthemomentwedonotcareaboutnamessowe’llskipit.Afterthatthere’stherb_make_metaclass()inquestion.I’mconcernedbythefactthatwhencalledfromrb_singleton_class(),theparametersweredifferent.Lasttimewaslikethis:
rb_make_metaclass(obj,RBASIC(obj)->klass);
Butthistimeislikethis:
rb_make_metaclass(klass,RBASIC(super)->klass);
Soasyoucanseeit’sslightlydifferent.Howdotheresultschangedependingonthat?Let’shaveonceagainalookatasimplified
rb_make_metaclass().
rb_make_metaclass(oncemore)▼rb_make_metaclass(afterfirstcompression)
rb_make_metaclass(obj,super){klass=createaclasswithsuperassuperclass;FL_SET(klass,FL_SINGLETON);RBASIC(obj)->klass=klass;rb_singleton_class_attached(klass,obj);if(BUILTIN_TYPE(obj)==T_CLASS){RBASIC(klass)->klass=klass;if(FL_TEST(obj,FL_SINGLETON)){RCLASS(klass)->super=RBASIC(rb_class_real(RCLASS(obj)->super))->klass;}}
returnklass;}
Lasttime,theifstatementwaswhollyskipped,butlookingonceagain,somethingisdoneonlyforT_CLASS,inotherwordsclasses.Thisclearlylooksimportant.Inrb_define_class_id(),asit’scalledlikethis:
rb_make_metaclass(klass,RBASIC(super)->klass);
Let’sexpandrb_make_metaclass()’sparametervariableswiththeactualvalues.
▼rb_make_metaclass(recompression)
rb_make_metaclass(klass,super_klass/*==RBASIC(super)->klass*/){sclass=createaclasswithsuper_classassuperclass;RBASIC(klass)->klass=sclass;RBASIC(sclass)->klass=sclass;returnsclass;}
Doingthisasadiagramgivessomethinglikefigure4.Init,thenamesbetweenparenthesesaresingletonclasses.ThisnotationisoftenusedinthisbooksoI’dlikeyoutorememberit.Thismeansthatobj‘ssingletonclassiswrittenas(obj).And(klass)isthesingletonclassforklass.Itlookslikethesingletonclassiscaughtbetweenaclassandthisclass’ssuperclass’sclass.
Figure4:Introductionofaclass’ssingletonclass
Byexpandingourimaginationfurtherfromthisresult,wecanthinkthatthesuperclass’sclass(thecinfigure4)mustagainbeasingletonclass.You’llunderstandwithonemoreinheritancelevel(figure5).
Figure5:Hierarchyofmulti-levelinheritance
Astherelationshipbetweensuperandklassisthesameastheonebetweenklassandklass2,cmustbethesingletonclass(super).Ifyoucontinuelikethis,finallyyou’llarriveattheconclusionthatObject‘sclassmustbe(Object).Andthat’sthecaseinpractice.Forexample,byinheritinglikeinthefollowingprogram:
classA<ObjectendclassB<Aend
internally,astructurelikefigure6iscreated.
Figure6:Classhierarchyandmetaclasses
Asclassesandtheirmetaclassesarelinkedandinheritlikethis,classmethodsareinherited.
ClassofaclassofaclassYou’veunderstoodtheworkingofclassmethodsinheritance,butbydoingthat,intheoppositesomequestionshaveappeared.Whatistheclassofaclass’ssingletonclass?Forthis,wecancheckitbyusingdebuggers.I’vemadefigure7fromtheresultsofthisinvestigation.
Figure7:Classofaclass’ssingletonclass
Aclass’ssingletonclassputsitselfasitsownclass.Quitecomplicated.
Thesecondquestion:theclassofObjectmustbeClass.Didn’tIproperlyconfirmthisinchapter1:Rubylanguageminimumbyusingclass()method?
p(Object.class())#Class
Certainly,that’sthecase“attheRubylevel”.But“attheClevel”,it’sthesingletonclass(Object).If(Object)doesnotappearattheRubylevel,it’sbecauseObject#classskipsthesingletonclasses.Let’slookatthebodyofthemethod,rb_obj_class()toconfirmthat.
▼rb_obj_class()
86VALUE87rb_obj_class(obj)88VALUEobj;89{90returnrb_class_real(CLASS_OF(obj));91}
76VALUE77rb_class_real(cl)78VALUEcl;79{80while(FL_TEST(cl,FL_SINGLETON)||TYPE(cl)==T_ICLASS){81cl=RCLASS(cl)->super;82}83returncl;84}
(object.c)
CLASS_OF(obj)returnsthebasic.klassofobj.Whileinrb_class_real(),allsingletonclassesareskipped(advancingtowardsthesuperclass).Inthefirstplace,singletonclassarecaughtbetweenaclassanditssuperclass,likeaproxy.That’swhywhena“real”classisnecessary,wehavetofollowthesuperclass
chain(figure8).
I_CLASSwillappearlaterwhenwewilltalkaboutinclude.
Figure8:Singletonclassandrealclass
SingletonclassandmetaclassWell,thesingletonclassesthatwereintroducedinclassesisalsoonetypeofclass,it’saclass’sclass.Soitcanbecalledmetaclass.
However,youshouldbewaryofthefactthatbeingasingletonclassdoesnotmeanbeingametaclass.Thesingletonclassesintroducedinclassesaremetaclasses.Theimportantfactisnotthattheyaresingletonclasses,butthattheyaretheclassesofclasses.IwasstuckonthispointwhenIstartedlearningRuby.AsImaynotbetheonlyone,Iwouldliketomakethisclear.
Thinkingaboutthis,therb_make_metaclass()functionnameisnotverygood.Whenusedforaclass,itdoesindeedcreateametaclass,butwhenusedforotherobjects,thecreatedclassisnotametaclass.
Thenfinally,evenifyouunderstoodthatsomeclassesare
metaclasses,it’snotasiftherewasanyconcretegain.I’dlikeyounottocaretoomuchaboutit.
BootstrapWehavenearlyfinishedourtalkaboutclassesandmetaclasses.Butthereisstilloneproblemleft.It’saboutthe3metaobjectsObject,ModuleandClass.These3cannotbecreatedwiththecommonuseAPI.Tomakeaclass,itsmetaclassmustbebuilt,butlikewesawsometimeago,themetaclass’ssuperclassisClass.However,asClasshasnotbeencreatedyet,themetaclasscannotbebuild.Soinruby,onlythese3classes’screationishandledspecially.
Thenlet’slookatthecode:
▼Object,ModuleandClasscreation
1243rb_cObject=boot_defclass("Object",0);1244rb_cModule=boot_defclass("Module",rb_cObject);1245rb_cClass=boot_defclass("Class",rb_cModule);12461247metaclass=rb_make_metaclass(rb_cObject,rb_cClass);1248metaclass=rb_make_metaclass(rb_cModule,metaclass);1249metaclass=rb_make_metaclass(rb_cClass,metaclass);
(object.c)
First,inthefirsthalf,boot_defclass()issimilartorb_class_boot(),itjustcreatesaclasswithitsgivensuperclassset.Theselinksgiveussomethingliketheleftpartoffigure9.
Andinthethreelinesofthesecondhalf,(Object),(Module)and(Class)arecreatedandset(rightfigure9).(Object)and(Module)‘sclasses…thatisthemselves…isalreadysetinrb_make_metaclass()sothereisnoproblem.Withthis,themetaobjects’bootstrapisfinished.
Figure9:Metaobjectscreation
Aftertakingeverythingintoaccount,itgivesusthefinalshapelikefigure10.
Figure10:Rubymetaobjects
Classnames
Inthissection,wewillanalysehow’sformedthereciprocalconversionbetweenclassandclassnames,inotherwordsconstants.Concretely,wewilltargetrb_define_class()andrb_define_class_under().
Name→classFirstwe’llreadrb_defined_class().Aftertheendofthisfunction,theclasscanbefoundfromtheconstant.
▼rb_define_class()
183VALUE184rb_define_class(name,super)185constchar*name;186VALUEsuper;187{188VALUEklass;189IDid;190191id=rb_intern(name);192if(rb_autoload_defined(id)){/*(A)autoload*/193rb_autoload_load(id);194}195if(rb_const_defined(rb_cObject,id)){/*(B)rb_const_defined*/196klass=rb_const_get(rb_cObject,id);/*(C)rb_const_get*/197if(TYPE(klass)!=T_CLASS){198rb_raise(rb_eTypeError,"%sisnotaclass",name);199}/*(D)rb_class_real*/200if(rb_class_real(RCLASS(klass)->super)!=super){201rb_name_error(id,"%sisalreadydefined",name);202}203returnklass;204}205if(!super){206rb_warn("nosuperclassfor'%s',Objectassumed",name);207}208klass=rb_define_class_id(id,super);209rb_class_inherited(super,klass);210st_add_direct(rb_class_tbl,id,klass);211212returnklass;213}
(class.c)
Thiscanbeclearlydividedintothetwoparts:beforeandafterrb_define_class_id().Theformeristoacquireorcreatetheclass.Thelatteristoassignittotheconstant.Wewilllookatitinmoredetailbelow.
(A)InRuby,thereisafeaturenamedautoloadthatautomaticallyloadslibrarieswhencertainconstantsareaccessed.Thesefunctionsnamedrb_autoload_xxxx()areforitschecks.Youcanignoreitwithoutanyproblem.
(B)WedeterminewhetherthenameconstanthasbeendefinedornotinObject.
(C)Getthevalueofthenameconstant.Thiswillbeexplainedindetailinchapter6.
(D)We’veseenrb_class_real()sometimeago.IftheclasscisasingletonclassoranICLASS,itclimbsthesuperhierarchyuptoaclassthatisnotandreturnsit.Inshort,thisfunctionskipsthevirtualclassesthatshouldnotappearattheRubylevel.
That’swhatwecanreadnearby.
Asconstantsareinvolvedaroundthis,itisverytroublesome.ButIfeellikethechapteraboutconstantsisprobablynotsorightplacetotalkaboutclassdefinition,that’sthereasonofsuchhalfwaydescriptionaroundhere.
Moreover,aboutthiscomingafterrb_define_class_id(),
st_add_direct(rb_class_tbl,id,klass);
Thispartassignstheclasstotheconstant.However,whicheverwayyoulookatityoudonotseethat.Infact,top-levelclassesand
modulesthataredefinedinCareseparatedfromtheotherconstantsandregroupedinrb_class_tbl().ThesplitisslightlyrelatedtotheGC.It’snotessential.
Class→nameWeunderstoodhowtheclasscanbeobtainedfromtheclassname,buthowtodotheopposite?BydoingthingslikecallingporClass#name,wecangetthenameoftheclass,buthowisitimplemented?
Infactthisisdonebyrb_name_class()whichalreadyappearedalongtimeago.Thecallisaroundthefollowing:
rb_define_classrb_define_class_idrb_name_class
Let’slookatitscontent:
▼rb_name_class()
269void270rb_name_class(klass,id)271VALUEklass;272IDid;273{274rb_iv_set(klass,"__classid__",ID2SYM(id));275}
(variable.c)
__classid__isanotherinstancevariablethatcan’tbeseenfromRuby.AsonlyVALUEscanbeputintheinstancevariabletable,theIDisconvertedtoSymbolusingID2SYM().
That’showweareabletofindtheconstantnamefromtheclass.
NestedclassesSo,inthecaseofclassesdefinedatthetop-level,weknowhowworksthereciprocallinkbetweennameandclass.What’sleftisthecaseofclassesdefinedinmodulesorotherclasses,andforthatit’salittlemorecomplicated.Thefunctiontodefinethesenestedclassesisrb_define_class_under().
▼rb_define_class_under()
215VALUE216rb_define_class_under(outer,name,super)217VALUEouter;218constchar*name;219VALUEsuper;220{221VALUEklass;222IDid;223224id=rb_intern(name);225if(rb_const_defined_at(outer,id)){226klass=rb_const_get(outer,id);227if(TYPE(klass)!=T_CLASS){228rb_raise(rb_eTypeError,"%sisnotaclass",name);229}230if(rb_class_real(RCLASS(klass)->super)!=super){231rb_name_error(id,"%sisalreadydefined",name);232}233returnklass;
234}235if(!super){236rb_warn("nosuperclassfor'%s::%s',Objectassumed",237rb_class2name(outer),name);238}239klass=rb_define_class_id(id,super);240rb_set_class_path(klass,outer,name);241rb_class_inherited(super,klass);242rb_const_set(outer,id,klass);243244returnklass;245}
(class.c)
Thestructureisliketheoneofrb_define_class():beforethecalltorb_define_class_id()istheredefinitioncheck,afteristhecreationofthereciprocallinkbetweenconstantandclass.Thefirsthalfisprettyboringlysimilartorb_define_class()sowe’llskipit.Inthesecondhalf,rb_set_class_path()isnew.We’regoingtolookatit.
rb_set_class_path()
Thisfunctiongivesthenamenametotheclassklassnestedintheclassunder.“classpath”meansaconstantnameincludingallthenestinginformationstartingfromtop-level,forexample“Net::NetPrivate::Socket”.
▼rb_set_class_path()
210void211rb_set_class_path(klass,under,name)212VALUEklass,under;213constchar*name;
214{215VALUEstr;216217if(under==rb_cObject){/*definedattop-level*/218str=rb_str_new2(name);/*createaRubystringfromname*/219}220else{/*nestedconstant*/221str=rb_str_dup(rb_class_path(under));/*copythereturnvalue*/222rb_str_cat2(str,"::");/*concatenate"::"*/223rb_str_cat2(str,name);/*concatenatename*/224}225rb_iv_set(klass,"__classpath__",str);226}
(variable.c)
Everythingexceptthelastlineistheconstructionoftheclasspath,andthelastlinemakestheclassrememberitsownname.__classpath__isofcourseanotherinstancevariablethatcan’tbeseenfromaRubyprogram.Inrb_name_class()therewas__classid__,butidisdifferentbecauseitdoesnotincludenestinginformation(lookatthetablebelow).
__classpath__Net::NetPrivate::Socket__classid__Socket
Itmeansclassesdefinedforexampleinrb_defined_class()allhave__classid__or__classpath__defined.Sotofindunder‘sclasspathwecanlookupintheseinstancevariables.Thisisdonebyrb_class_path().We’llomititscontent.
Namelessclasses
ContrarytowhatIhavejustsaid,thereareinfactcasesinwhichneither__classpath__nor__classid__areset.ThatisbecauseinRubyyoucanuseamethodlikethefollowingtocreateaclass.
c=Class.new()
Ifaclassiscreatedlikethis,itwon’tgothroughrb_define_class_id()andtheclasspathwon’tbeset.Inthiscase,cdoesnothaveanyname,whichistosaywegetanunnamedclass.
However,iflaterit’sassignedtoaconstant,anamewillbeattachedtotheclassatthatmoment.
SomeClass=c#theclassnameisSomeClass
Strictlyspeaking,atthefirsttimerequestingthenameafterassigningittoaconstant,thenamewillbeattachedtotheclass.Forinstance,whencallingponthisSomeClassclassorwhencallingtheClass#namemethod.Whendoingthis,avalueequaltotheclassissearchedinrb_class_tbl,andanamehastobechosen.Thefollowingcasecanalsohappen:
classAclassBC=tmp=Class.new()p(tmp)#herewesearchforthenameendend
sointheworstcasewehavetosearchforthewholeconstant
space.However,generally,therearen’tmanyconstantssoevensearchingallconstantsdoesnottaketoomuchtime.
Include
Weonlytalkedaboutclassessolet’sfinishthischapterwithsomethingelseandtalkaboutmoduleinclusion.
rb_include_module(1)IncludesaredonebytheordinarymethodModule#include.ItscorrespondingfunctioninCisrb_include_module().Infact,tobeprecise,itsbodyisrb_mod_include(),andthereModule#append_featureiscalled,andthisfunction’sdefaultimplementationfinallycallsrb_include_module().Mixingwhat’shappeninginRubyandCgivesusthefollowingcallgraph.
Module#include(rb_mod_include)Module#append_features(rb_mod_append_features)rb_include_module
Anyway,themanipulationsthatareusuallyregardedasinclusionsaredonebyrb_include_module().Thisfunctionisalittlelongsowe’lllookatitahalfatatime.
▼rb_include_module(firsthalf)
/*includemoduleinclass*/347void348rb_include_module(klass,module)349VALUEklass,module;350{351VALUEp,c;352intchanged=0;353354rb_frozen_class_p(klass);355if(!OBJ_TAINTED(klass)){356rb_secure(4);357}358359if(NIL_P(module))return;360if(klass==module)return;361362switch(TYPE(module)){363caseT_MODULE:364caseT_CLASS:365caseT_ICLASS:366break;367default:368Check_Type(module,T_MODULE);369}
(class.c)
Forthemomentit’sonlysecurityandtypechecking,thereforewecanignoreit.Theprocessitselfisbelow:
▼rb_include_module(secondhalf)
371OBJ_INFECT(klass,module);372c=klass;373while(module){374intsuperclass_seen=Qfalse;375376if(RCLASS(klass)->m_tbl==RCLASS(module)->m_tbl)377rb_raise(rb_eArgError,"cyclicincludedetected");378/*(A)skipifthesuperclassalreadyincludesmodule*/
379for(p=RCLASS(klass)->super;p;p=RCLASS(p)->super){380switch(BUILTIN_TYPE(p)){381caseT_ICLASS:382if(RCLASS(p)->m_tbl==RCLASS(module)->m_tbl){383if(!superclass_seen){384c=p;/*movetheinsertionpoint*/385}386gotoskip;387}388break;389caseT_CLASS:390superclass_seen=Qtrue;391break;392}393}394c=RCLASS(c)->super=include_class_new(module,RCLASS(c)->super);395changed=1;396skip:397module=RCLASS(module)->super;398}399if(changed)rb_clear_cache();400}
(class.c)
First,whatthe(A)blockdoesiswritteninthecomment.Itseemstobeaspecialconditionsolet’sfirstskipreadingitfornow.Byextractingtheimportantpartsfromtherestwegetthefollowing:
c=klass;while(module){c=RCLASS(c)->super=include_class_new(module,RCLASS(c)->super);module=RCLASS(module)->super;}
Inotherwords,it’sarepetitionofmodule‘ssuper.Whatisinmodule’ssupermustbeamoduleincludedbymodule(becauseourintuition
tellsusso).Thenthesuperclassoftheclasswheretheinclusionoccursisreplacedwithsomething.Wedonotunderstandmuchwhat,butatthemomentIsawthatIfelt“Ah,doesn’tthislooktheadditionofelementstoalist(likeLISP’scons)?”anditsuddenlymakethestoryfaster.Inotherwordsit’sthefollowingform:
list=new(item,list)
Thinkingaboutthis,itseemswecanexpectthatmoduleisinsertedbetweencandc->super.Ifit’slikethis,itfitsmodule’sspecification.
Buttobesureofthiswehavetolookatinclude_class_new().
include_class_new()
▼include_class_new()
319staticVALUE320include_class_new(module,super)321VALUEmodule,super;322{323NEWOBJ(klass,structRClass);/*(A)*/324OBJSETUP(klass,rb_cClass,T_ICLASS);325326if(BUILTIN_TYPE(module)==T_ICLASS){327module=RBASIC(module)->klass;328}329if(!RCLASS(module)->iv_tbl){330RCLASS(module)->iv_tbl=st_init_numtable();331}332klass->iv_tbl=RCLASS(module)->iv_tbl;/*(B)*/333klass->m_tbl=RCLASS(module)->m_tbl;334klass->super=super;/*(C)*/
335if(TYPE(module)==T_ICLASS){/*(D)*/336RBASIC(klass)->klass=RBASIC(module)->klass;/*(D-1)*/337}338else{339RBASIC(klass)->klass=module;/*(D-2)*/340}341OBJ_INFECT(klass,module);342OBJ_INFECT(klass,super);343344return(VALUE)klass;345}
(class.c)
We’reluckythere’snothingwedonotknow.
(A)Firstcreateanewclass.
(B)Transplantmodule’sinstancevariableandmethodtablesintothisclass.
(C)Maketheincludingclass’ssuperclass(super)thesuperclassofthisnewclass.
Inotherwords,itlookslikethisfunctioncreatesanincludeclasswhichwecanregarditassomethinglikean“avatar”ofthemodule.Theimportantpointisthatat(B)onlythepointerismovedon,withoutduplicatingthetable.Later,ifamethodisadded,themodule’sbodyandtheincludeclasswillstillhaveexactlythesamemethods(figure11).
Figure11:Includeclass
Ifyoulookcloselyat(A),thestructuretypeflagissettoT_ICLASS.Thisseemstobethemarkofanincludeclass.Thisfunction’snameisinclude_class_new()soICLASS’sImustbeinclude.
Andifyouthinkaboutjoiningwhatthisfunctionandrb_include_module()do,weknowthatourpreviousexpectationswerenotwrong.Inbrief,includingisinsertingtheincludeclassofamodulebetweenaclassanditssuperclass(figure12).
Figure12:Include
At(D-2)themoduleisstoredintheincludeclass’sklass.At(D-1),themodule’sbodyistakenout…I’dliketosaysoifpossible,butinfactthischeckdoesnothaveanyuse.TheT_ICLASScheckisalreadydoneatthebeginningofthisfunction,sowhenarrivingheretherecan’tstillbeaT_ICLASS.Modificationtorubypiledupatpiecebypieceduringquitealongperiodoftimesotherearequiteafewsmalloverlooks.
Thereisonemorethingtoconsider.Somehowtheincludeclass’sbasic.klassisonlyusedtopointtothemodule’sbody,soforexamplecallingamethodontheincludeclasswouldbeverybad.SoincludeclassesmustnotbeseenfromRubyprograms.Andinpracticeallmethodsskipincludeclasses,withnoexception.
SimulationItwascomplicatedsolet’slookataconcreteexample.I’dlikeyoutolookatfigure13(1).Wehavethec1classandthem1modulethatincludesm2.Fromthere,thechangesmadetoincludem1inc1are(2)and(3).imsareofcourseincludeclasses.
Figure13:Include
rb_include_module(2)Well,nowwecanexplainthepartofrb_include_module()weskipped.
▼rb_include_module(avoidingdoubleinclusion)
378/*(A)skipifthesuperclassalreadyincludesmodule*/379for(p=RCLASS(klass)->super;p;p=RCLASS(p)->super){380switch(BUILTIN_TYPE(p)){381caseT_ICLASS:382if(RCLASS(p)->m_tbl==RCLASS(module)->m_tbl){383if(!superclass_seen){384c=p;/*theinsertingpointismoved*/385}386gotoskip;387}388break;389caseT_CLASS:390superclass_seen=Qtrue;391break;392}393}
(class.c)
Amongthesuperclassesoftheklass(p),ifapisT_ICLASS(anincludeclass)andhasthesamemethodtableastheoneofthemodulewewanttoinclude(module),itmeansthatthepisanincludeclassofthemodule.Therefore,itwouldbeskippedtonotincludethemoduletwice.However,ifthismoduleincludesanothermodule(module->super),Itwouldbecheckedoncemore.
But,becausepisamodulethathasbeenincludedonce,themodulesincludedbyitmustalsoalreadybeincluded…that’swhatIthoughtforamoment,butwecanhavethefollowingcontext:
moduleMendmoduleM2
endclassCincludeM#M2isnotyetincludedinMend#thereforeM2isnotinC'ssuperclasses
moduleMincludeM2#asthereM2isincludedinM,endclassCincludeM#IwouldlikeheretoonlyaddM2end
Tosaythisconversely,therearecasesthataresultofincludeisnotpropagatedsoon.
Forclassinheritance,theclass’ssingletonmethodswereinheritedbutinthecaseofmodulethereisnosuchthing.Thereforethesingletonmethodsofthemodulearenotinheritedbytheincludingclass(ormodule).Whenyouwanttoalsoinheritsingletonmethods,theusualwayistooverrideModule#append_features.
TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera
CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License
RubyHackingGuide
TranslatedbySebastianKrause&ocha-
Chapter5:Garbage
Collection
Aconceptionofanexecutingprogram
It’sallofasuddenbutatthebeginningofthischapter,we’lllearnaboutthememoryspaceofanexecutingprogram.Inthischapterwe’llstepinsidethelowerlevelpartsofacomputerquiteabit,sowithoutpreliminaryknowledgeit’llbehardtofollow.Andit’llbealsonecessaryforthefollowingchapters.Oncewefinishthishere,therestwillbeeasier.
MemorySegmentsAgeneralCprogramhasthefollowingpartsinthememoryspace:
1. thetextarea2. aplaceforstaticandglobalvariables3. themachinestack4. theheap
Thetextareaiswherethecodelies.Obviouslythesecondarea
holdsstaticandglobalvariables.Argumentsandlocalvariablesoffunctionsarepilingupinthemachinestack.Theheapistheplacewhereallocatedbymalloc().
Let’stalkabitmoreaboutnumberthree,themachinestack.Sinceitiscalledthemachine“stack”,obviouslyithasastackstructure.Inotherwords,newstuffispiledontopofitoneafteranother.Whenweactuallypushesvaluesonthestack,eachvaluewouldbeatinypiecesuchasint.Butlogically,therearealittlelargerpieces.Theyarecalledstackframes.
Onestackframecorrespondstoonefunctioncall.Orinotherwordswhenthereisafunctioncall,onestackframeispushed.Whendoingreturn,onestackframewillbepopped.Figure1showsthereallysimplifiedappearanceofthemachinestack.
Figure1:MachineStack
Inthispicture,“above”iswrittenabovethetopofthestack,butthisitisnotnecessarilyalwaysthecasethatthemachinestack
goesfromlowaddressestohighaddresses.Forinstance,onthex86machinethestackgoesfromhightolowaddresses.
alloca()
Byusingmalloc(),wecangetanarbitrarilylargememoryareaoftheheap.alloca()isthemachinestackversionofit.Butunlikemalloc()it’snotnecessarytofreethememoryallocatedwithalloca().Oroneshouldsay:itisfreedautomaticallyatthesamemomentofreturnofeachfunction.That’swhyit’snotpossibletouseanallocatedvalueasthereturnvalue.It’sthesameas“Youmustnotreturnthepointertoalocalvariable.”
There’sbeennotanydifficulty.Wecanconsideritsomethingtolocallyallocateanarraywhosesizecanbechangedatruntime.
Howeverthereexistenvironmentswherethereisnonativealloca().Therearestillmanywhowouldliketousealloca()evenifinsuchenvironment,sometimesafunctiontodothesamethingiswritteninC.Butinthatcase,onlythefeaturethatwedon’thavetofreeitbyourselvesisimplementedanditdoesnotnecessarilyallocatethememoryonthemachinestack.Infact,itoftendoesnot.Ifitwerepossible,anativealloca()couldhavebeenimplementedinthefirstplace.
Howcanoneimplementalloca()inC?Thesimplestimplementationis:firstallocatememorynormallywithmalloc().Thenrememberthepairofthefunctionwhichcalledalloca()and
theassignedaddressesinagloballist.Afterthat,checkthislistwheneveralloca()iscalled,iftherearethememoriesallocatedforthefunctionsalreadyfinished,freethembyusingfree().
Figure2:Thebehaviorofanalloca()implementedinC
Themissing/alloca.cofrubyisanexampleofanemulatedalloca().
Overview
Fromhereonwecanatlasttalkaboutthemainsubjectofthischapter:garbagecollection.
WhatisGC?
Objectsarenormallyontopofthememory.Naturally,ifalotofobjectsarecreated,alotofmemoryisused.Ifmemorywereinfinitetherewouldbenoproblem,butinrealitythereisalwaysamemorylimit.That’swhythememorywhichisnotusedanymoremustbecollectedandrecycled.Moreconcretelythememoryreceivedthroughmalloc()mustbereturnedwithfree().
However,itwouldrequirealotofeffortsifthemanagementofmalloc()andfree()wereentirelylefttoprogrammers.Especiallyinobjectorientedprograms,becauseobjectsarereferringeachother,itisdifficulttotellwhentoreleasememory.
Theregarbagecollectioncomesin.GarbageCollection(GC)isafeaturetoautomaticallydetectandfreethememorywhichhasbecomeunnecessary.Withgarbagecollection,theworry“WhenshouldIhavetofree()??”hasbecomeunnecessary.Betweenwhenitexistsandwhenitdoesnotexist,theeaseofwritingprogramsdiffersconsiderably.
Bytheway,inabookaboutsomethingthatI’veread,there’sadescription“thethingtotidyupthefragmentedusablememoryisGC”.Thistaskiscalled“compaction”.Itiscompactionbecauseitmakesathingcompact.Becausecompactionmakesmemorycachemoreoftenhit,ithaseffectsforspeed-uptosomeextent,butitisnotthemainpurposeofGC.ThepurposeofGCistocollectmemory.TherearemanyGCswhichcollectmemoriesbutdon’tdocompaction.TheGCofrubyalsodoesnotdocompaction.
Then,inwhatkindofsystemisGCavailable?InCandC++,there’sBoehmGC\footnote{BoehmGChttp://www.hpl.hp.com/personal/Hans_Boehm/gc}whichcanbeusedasanadd-on.And,fortherecentlanguagessuchasJavaandPerl,Python,C#,Eiffel,GCisastandardequipment.Andofcourse,RubyhasitsGC.Let’sfollowthedetailsofruby’sGCinthischapter.Thetargetfileisgc.c.
WhatdoesGCdo?BeforeexplainingtheGCalgorithm,Ishouldexplain“whatgarbagecollectionis”.Inotherwords,whatkindofstateofthememoryis“theunnecessarymemory”?
Tomakedescriptionsmoreconcrete,let’ssimplifythestructurebyassumingthatthereareonlyobjectsandlinks.ThiswouldlookasshowninFigure3.
Figure3:Objects
Theobjectspointedtobyglobalvariablesandtheobjectsonthe
stackofalanguagearesurelynecessary.Andobjectspointedtobyinstancevariablesoftheseobjectsarealsonecessary.Furthermore,theobjectsthatarereachablebyfollowinglinksfromtheseobjectsarealsonecessary.
Toputitmorelogically,thenecessaryobjectsareallobjectswhichcanbereachedrecursivelyvialinksfromthe“surelynecessaryobjects”asthestartpoints.Thisisdepictedinfigure4.Whatareontheleftofthelineareall“surelynecessaryobjects”,andtheobjectswhichcanbereachedfromthemarecoloredblack.Theseobjectscoloredblackarethenecessaryobjects.Therestoftheobjectscanbereleased.
Figure4:necessaryobjectsandunnecessaryobjects
Intechnicalterms,“thesurelynecessaryobjects”arecalled“therootsofGC”.That’sbecausetheyaretherootsoftreestructuresthatemergesasaconsequenceoftracingnecessaryobjects.
MarkandSweepGCwasfirstimplementedinLisp.TheGCimplementedinLispatfirst,itmeanstheworld’sfirstGC,iscalledmark&sweepGC.TheGCofrubyisonetypeofit.
TheimageofMark-and-SweepGCisprettyclosetoourdefinitionof“necessaryobject”.First,put“marks”ontherootobjects.Settingthemasthestartpoints,put“marks”onallreachableobjects.Thisisthemarkphase.
Atthemomentwhenthere’snotanyreachableobjectleft,checkallobjectsintheobjectpool,release(sweep)allobjectsthathavenotmarked.“Sweep”isthe“sweep”ofMinesweeper.
Therearetwoadvantages.
Theredoesnotneedtobeany(oralmostany)concernforgarbagecollectionoutsidetheimplementationofGC.Cyclescanalsobereleased.(Asforcycles,seealsothesectionof“ReferenceCount”)
Therearealsotwodisadvantages.
Inordertosweepeveryobjectmustbetouchedatleastonce.
TheloadoftheGCisconcentratedatonepoint.
Whenusingtheemacseditor,theresometimesappears"Garbagecollecting..."anditcompletelystopsreacting.Thatisanexampleoftheseconddisadvantage.Butthispointcanbealleviatedbymodifyingthealgorithm(itiscalledincrementalGC).
StopandCopyStopandCopyisavariationofMarkandSweep.First,prepareseveralobjectareas.Tosimplifythisdescription,assumetherearetwoareasAandBhere.Andputan“active”markontheoneoftheareas.Whencreatinganobject,createitonlyinthe“active”one.(Figure5)
Figure5:StopandCopy(1)
WhentheGCstarts,followlinksfromtherootsinthesamemannerasmark-and-sweep.However,moveobjectstoanotherareainsteadofmarkingthem(Figure6).Whenallthelinkshavebeenfollowed,discardtheallelementswhichremaininA,andmakeBactivenext.
Figure6:StopandCopy(2)
StopandCopyalsohastwoadvantages:
CompactionhappensatthesametimeascollectingthememorySinceobjectsthatreferenceeachothermoveclosertogether,there’smorepossibilityofhittingthecache.
Andalsotwodisadvantages:
TheobjectareaneedstobemorethantwiceasbigThepositionsofobjectswillbechanged
Itseemswhatexistinthisworldarenotonlypositivethings.
ReferencecountingReferencecountingdiffersabitfromtheaforementionedGCs,thereach-checkcodeisdistributedinseveralplaces.
First,attachanintegercounttoeachelement.Whenreferringviavariablesorarrays,thecounterofthereferencedobjectisincreased.Whenquittingtorefer,decreasethecounter.Whenthecounterofanobjectbecomeszero,releasetheobject.Thisisthe
methodcalledreferencecounting(Figure7).
Figure7:Referencecounting
Thismethodalsohastwoadvantages:
TheloadofGCisdistributedovertheentireprogram.Theobjectthatbecomesunnecessaryisimmediatelyfreed.
Andalsotwodisadvantages.
Thecounterhandlingtendstobeforgotten.Whendoingitnaivelycyclesarenotreleased.
I’llexplainaboutthesecondpointjustincase.AcycleisacycleofreferencesasshowninFigure8.Ifthisisthecasethecounterswillneverdecreaseandtheobjectswillneverbereleased.
Figure8:Cycle
Bytheway,latestPython(2.2)usesreferencecountingGCbutitcanfreecycles.However,itisnotbecauseofthereferencecountingitself,butbecauseitsometimesinvokesmarkandsweepGCtocheck.
ObjectManagement
Ruby’sgarbagecollectionisonlyconcernedwithrubyobjects.Moreover,itonlyconcernedwiththeobjectscreatedandmanagedbyruby.Converselyspeaking,ifthememoryisallocatedwithoutfollowingacertainprocedure,itwon’tbetakencareof.Forinstance,thefollowingfunctionwillcauseamemoryleakevenifrubyisrunning.
voidnot_ok(){malloc(1024);/*receivememoryanddiscardit*/}
However,thefollowingfunctiondoesnotcauseamemoryleak.
voidthis_is_ok()
{rb_ary_new();/*createarubyarrayanddiscardit*/}
Sincerb_ary_new()usesRuby’sproperinterfacetoallocatememory,thecreatedobjectisunderthemanagementoftheGCofruby,thusrubywilltakecareofit.
structRVALUE
Sincethesubstanceofanobjectisastruct,managingobjectsmeansmanagingthatstructs.Ofcoursethenon-pointerobjectslikeFixnumSymbolniltruefalseareexceptions,butIwon’talwaysdescribeaboutittopreventdescriptionsfrombeingredundant.
Eachstructtypehasitsdifferentsize,butprobablyinordertokeepmanagementsimpler,aunionofallthestructsofbuilt-inclassesisdeclaredandtheunionisalwaysusedwhendealingwithmemory.Thedeclarationofthatunionisasfollows.
▼RVALUE
211typedefstructRVALUE{212union{213struct{214unsignedlongflags;/*0ifnotused*/215structRVALUE*next;216}free;217structRBasicbasic;218structRObjectobject;219structRClassklass;220structRFloatflonum;221structRStringstring;
222structRArrayarray;223structRRegexpregexp;224structRHashhash;225structRDatadata;226structRStructrstruct;227structRBignumbignum;228structRFilefile;229structRNodenode;230structRMatchmatch;231structRVarmapvarmap;232structSCOPEscope;233}as;234}RVALUE;
(gc.c)
structRVALUEisastructthathasonlyoneelement.I’veheardthatthereasonwhyunionisnotdirectlyusedistoenabletoeasilyincreaseitsmemberswhendebuggingorwhenextendinginthefuture.
First,let’sfocusonthefirstelementoftheunionfree.flags.Thecommentsays“0ifnotused”,butisittrue?Istherenotanypossibilityforfree.flagstobe0bychance?
Aswe’veseeninChapter2:Objects,allobjectstructshavestructRBasicasitsfirstelement.Therefore,bywhicheverelementoftheunionweaccess,obj->as.free.flagsmeansthesameasitiswrittenasobj->as.basic.flags.Andobjectsalwayshavethestruct-typeflag(suchasT_STRING),andtheflagisalwaysnot0.Therefore,theflagofan“alive”objectwillnevercoincidentallybe0.Hence,wecanconfirmthatsettingtheirflagsto0isnecessityandsufficiencytorepresent“dead”objects.
ObjectheapThememoryforalltheobjectstructshasbeenbroughttogetheringlobalvariableheaps.Hereafter,let’scallthisanobjectheap.
▼Objectheap
239#defineHEAPS_INCREMENT10240staticRVALUE**heaps;241staticintheaps_length=0;242staticintheaps_used=0;243244#defineHEAP_MIN_SLOTS10000245staticint*heaps_limits;246staticintheap_slots=HEAP_MIN_SLOTS;
(gc.c)
heapsisanarrayofarraysofstructRVALUE.SinceitisheapS,theeachcontainedarrayisprobablyeachheap.Eachelementofheapiseachslot(Figure9).
Figure9:heaps,heap,slot
Thelengthofheapsisheap_lengthanditcanbechanged.Thenumberoftheslotsactuallyinuseisheaps_used.Thelengthofeachheapisinthecorrespondingheaps_limits[index].Figure10showsthestructureoftheobjectheap.
Figure10:conceptualdiagramofheapsinmemory
Thisstructurehasanecessitytobethisway.Forinstance,ifallstructsarestoredinanarray,thememoryspacewouldbethemostcompact,butwecannotdorealloc()becauseitcouldchangetheaddresses.ThisisbecauseVALUEsaremerepointers.
InthecaseofanimplementationofJava,thecounterpartofVALUEsarenotaddressesbuttheindexesofobjects.Sincetheyarehandledthroughapointertable,objectsaremovable.Howeverinthiscase,indexingofthearraycomesineverytimeanobjectaccessoccurs
anditlowerstheperformanceinsomedegree.
Ontheotherhand,whathappensifitisanone-dimensionalarrayofpointerstoRVALUEs(itmeansVALUEs)?Thisseemstobeabletogowellatthefirstglance,butitdoesnotwhenGC.Thatis,asI’lldescribeindetail,theGCofrubyneedstoknowtheintegers"whichseemsVALUE(thepointerstoRVALUE).IfallRVALUEareallocatedinaddresseswhicharefarfromeachother,itneedstocomparealladdressofRVALUEwithallintegers“whichcouldbepointers”.ThismeansthetimeforGCbecomestheordermorethanO(n^2),andnotacceptable.
Accordingtotheserequirements,itisgoodthattheobjectheapformastructurethattheaddressesarecohesivetosomeextentandwhosepositionandtotalamountarenotrestrictedatthesametime.
freelist
UnusedRVALUEsaremanagedbybeinglinkedasasinglelinewhichisalinkedlistthatstartswithfreelist.Theas.free.nextofRVALUEisthelinkusedforthispurpose.
▼freelist
236staticRVALUE*freelist=0;
(gc.c)
add_heap()
Asweunderstoodthedatastructure,let’sreadthefunctionadd_heap()toaddaheap.Becausethisfunctioncontainsalotoflinesnotpartofthemainline,I’llshowtheonesimplifiedbyomittingerrorhandlingsandcastings.
▼add_heap()(simplified)
staticvoidadd_heap(){RVALUE*p,*pend;
/*extendheapsifnecessary*/if(heaps_used==heaps_length){heaps_length+=HEAPS_INCREMENT;heaps=realloc(heaps,heaps_length*sizeof(RVALUE*));heaps_limits=realloc(heaps_limits,heaps_length*sizeof(int));}
/*increaseheapsby1*/p=heaps[heaps_used]=malloc(sizeof(RVALUE)*heap_slots);heaps_limits[heaps_used]=heap_slots;pend=p+heap_slots;if(lomem==0||lomem>p)lomem=p;if(himem<pend)himem=pend;heaps_used++;heap_slots*=1.8;
/*linktheallocatedRVALUEtofreelist*/while(p<pend){p->as.free.flags=0;p->as.free.next=freelist;freelist=p;p++;}}
Pleasecheckthefollowingpoints.
thelengthofheapisheap_slotstheheap_slotsbecomes1.8timeslargereverytimewhenaheapisaddedthelengthofheaps[i](thevalueofheap_slotswhencreatingaheap)isstoredinheaps_limits[i].
Plus,sincelomemandhimemaremodifiedonlybythisfunction,onlybythisfunctionyoucanunderstandthemechanism.Thesevariablesholdthelowestandthehighestaddressesoftheobjectheap.Thesevaluesareusedlaterwhendeterminingtheintegers“whichseemsVALUE”.
rb_newobj()
Consideringalloftheabovepoints,wecantellthewaytocreateanobjectinasecond.IfthereisatleastaRVALUElinkedfromfreelist,wecanuseit.Otherwise,doGCorincreasetheheaps.Let’sconfirmthisbyreadingtherb_newobj()functiontocreateanobject.
▼rb_newobj()
297VALUE298rb_newobj()299{300VALUEobj;301302if(!freelist)rb_gc();303304obj=(VALUE)freelist;
305freelist=freelist->as.free.next;306MEMZERO((void*)obj,RVALUE,1);307returnobj;308}
(gc.c)
Iffreelestis0,inotherwords,ifthere’snotanyunusedstructs,invokeGCandcreatespaces.Evenifwecouldnotcollectnotanyobject,there’snoproblembecauseinthiscaseanewspaceisallocatedinrb_gc().Andtakeastructfromfreelist,zerofillitbyMEMZERO(),andreturnit.
Mark
Asdescribed,ruby’sGCisMark&Sweep.Its“mark”is,concretelyspeaking,tosetaFL_MARKflag:lookforunusedVALUE,setFL_MARKflagstofoundones,thenlookattheobjectheapafterinvestigatingallandfreeobjectsthatFL_MARKhasnotbeenset.
rb_gc_mark()
rb_gc_mark()isthefunctiontomarkobjectsrecursively.
▼rb_gc_mark()
573void574rb_gc_mark(ptr)
575VALUEptr;576{577intret;578registerRVALUE*obj=RANY(ptr);579580if(rb_special_const_p(ptr))return;/*specialconstnotmarked*/581if(obj->as.basic.flags==0)return;/*freecell*/582if(obj->as.basic.flags&FL_MARK)return;/*alreadymarked*/583584obj->as.basic.flags|=FL_MARK;585586CHECK_STACK(ret);587if(ret){588if(!mark_stack_overflow){589if(mark_stack_ptr-mark_stack<MARK_STACK_MAX){590*mark_stack_ptr=ptr;591mark_stack_ptr++;592}593else{594mark_stack_overflow=1;595}596}597}598else{599rb_gc_mark_children(ptr);600}601}
(gc.c)
ThedefinitionofRANY()isasfollows.Itisnotparticularlyimportant.
▼RANY()
295#defineRANY(o)((RVALUE*)(o))
(gc.c)
Therearethechecksfornon-pointersoralreadyfreedobjectsandtherecursivechecksformarkedobjectsatthebeginning,
obj->as.basic.flags|=FL_MARK;
andobj(thisistheptrparameterofthisfunction)ismarked.Thennext,it’stheturntofollowthereferencesfromobjandmark.rb_gc_mark_children()doesit.
Theothers,whatstartswithCHECK_STACK()andiswrittenalotisadevicetopreventthemachinestackoverflow.Sincerb_gc_mark()usesrecursivecallstomarkobjects,ifthereisabigobjectcluster,itispossibletorunshortofthelengthofthemachinestack.Tocounterthat,ifthemachinestackisnearlyoverflow,itstopstherecursivecalls,pilesuptheobjectsonagloballist,andlateritmarksthemonceagain.Thiscodeisomittedbecauseitisnotpartofthemainline.
rb_gc_mark_children()
Now,asforrb_gc_mark_children(),itjustlistsuptheinternaltypesandmarksonebyone,thusitisnotjustlongbutalsonotinteresting.Here,itisshownbutthesimpleenumerationsareomitted:
▼rb_gc_mark_children()
603void604rb_gc_mark_children(ptr)
605VALUEptr;606{607registerRVALUE*obj=RANY(ptr);608609if(FL_TEST(obj,FL_EXIVAR)){610rb_mark_generic_ivar((VALUE)obj);611}612613switch(obj->as.basic.flags&T_MASK){614caseT_NIL:615caseT_FIXNUM:616rb_bug("rb_gc_mark()calledforbrokenobject");617break;618619caseT_NODE:620mark_source_filename(obj->as.node.nd_file);621switch(nd_type(obj)){622caseNODE_IF:/*1,2,3*/623caseNODE_FOR:624caseNODE_ITER:/*…………omitted…………*/749}750return;/*notneedtomarkbasic.klass*/751}752753rb_gc_mark(obj->as.basic.klass);754switch(obj->as.basic.flags&T_MASK){755caseT_ICLASS:756caseT_CLASS:757caseT_MODULE:758rb_gc_mark(obj->as.klass.super);759rb_mark_tbl(obj->as.klass.m_tbl);760rb_mark_tbl(obj->as.klass.iv_tbl);761break;762763caseT_ARRAY:764if(FL_TEST(obj,ELTS_SHARED)){765rb_gc_mark(obj->as.array.aux.shared);766}767else{768longi,len=obj->as.array.len;769VALUE*ptr=obj->as.array.ptr;770
771for(i=0;i<len;i++){772rb_gc_mark(*ptr++);773}774}775break;
/*…………omitted…………*/
837default:838rb_bug("rb_gc_mark():unknowndatatype0x%x(0x%x)%s",839obj->as.basic.flags&T_MASK,obj,840is_pointer_to_heap(obj)?"corruptedobject":"nonobject");841}842}
(gc.c)
Itcallsrb_gc_mark()recursively,isonlywhatI’dlikeyoutoconfirm.Intheomittedpart,NODEandT_xxxxareenumeratedrespectively.NODEwillbeintroducedinPart2.
Additionally,let’sseetheparttomarkT_DATA(thestructusedforextensionlibraries)becausethere’ssomethingwe’dliketocheck.Thiscodeisextractedfromthesecondswitchstatement.
▼rb_gc_mark_children()–T_DATA
789caseT_DATA:790if(obj->as.data.dmark)(*obj->as.data.dmark)(DATA_PTR(obj));791break;
(gc.c)
Here,itdoesnotuserb_gc_mark()orsimilarfunctions,butthe
dmarkwhichisgivenfromusers.Insideit,ofcourse,itmightuserb_gc_mark()orsomething,butnotusingisalsopossible.Forexample,inanextremesituation,ifauserdefinedobjectdoesnotcontainVALUE,there’snoneedtomark.
rb_gc()
Bynow,we’vefinishedtotalkabouteachobject.Fromnowon,let’sseethefunctionrb_gc()thatpresidesthewhole.Theobjectsmarkedhereare“objectswhichareobviouslynecessary”.Inotherwords,“therootsofGC”.
▼rb_gc()
1110void1111rb_gc()1112{1113structgc_list*list;1114structFRAME*volatileframe;/*gcc2.7.2.3-O2bug??*/1115jmp_bufsave_regs_gc_mark;1116SET_STACK_END;11171118if(dont_gc||during_gc){1119if(!freelist){1120add_heap();1121}1122return;1123}
/*……markfromtheallroots……*/
1183gc_sweep();1184}
(gc.c)
Therootswhichshouldbemarkedwillbeshownonebyoneafterthis,butI’dliketomentionjustonepointhere.
InrubytheCPUregistersandthemachinestackarealsotheroots.ItmeansthatthelocalvariablesandargumentsofCareautomaticallymarked.Forexample,
staticintf(void){VALUEarr=rb_ary_new();
/*……dovariousthings……*/}
likethisway,wecanprotectanobjectjustbyputtingitintoavariable.ThisisaverysignificanttraitoftheGCofruby.Becauseofthisfeature,ruby’sextensionlibrariesareinsanelyeasytowrite.
However,whatisonthestackisnotonlyVALUE.Therearealotoftotallyunrelatedvalues.HowtoresolvethisisthekeywhenreadingtheimplementationofGC.
TheRubyStackFirst,itmarksthe(ruby‘s)stackframesusedbytheinterpretor.SinceyouwillbeabletofindoutwhoitisafterreachingPart3,youdon’thavetothinksomuchaboutitfornow.
▼MarkingtheRubyStack
1130/*markframestack*/1131for(frame=ruby_frame;frame;frame=frame->prev){1132rb_gc_mark_frame(frame);1133if(frame->tmp){1134structFRAME*tmp=frame->tmp;1135while(tmp){1136rb_gc_mark_frame(tmp);1137tmp=tmp->prev;1138}1139}1140}1141rb_gc_mark((VALUE)ruby_class);1142rb_gc_mark((VALUE)ruby_scope);1143rb_gc_mark((VALUE)ruby_dyna_vars);
(gc.c)
ruby_frameruby_classruby_scoperuby_dyna_varsarethevariablestopointtoeachtopofthestacksoftheevaluator.Theseholdtheframe,theclassscope,thelocalvariablescope,andtheblocklocalvariablesatthattimerespectively.
RegisterNext,itmarkstheCPUregisters.
▼markingtheregisters
1148FLUSH_REGISTER_WINDOWS;1149/*Here,allregistersmustbesavedintojmp_buf.*/1150setjmp(save_regs_gc_mark);1151mark_locations_array((VALUE*)save_regs_gc_mark,sizeof(save_regs_gc_mark)/sizeof(VALUE*));
(gc.c)
FLUSH_REGISTER_WINDOWSisspecial.Wewillseeitlater.
setjmp()isessentiallyafunctiontoremotelyjump,butthecontentoftheregistersaresavedintotheargument(whichisavariableoftypejmp_buf)asitssideeffect.Makinguseofthis,itattemptstomarkthecontentoftheregisters.Thingsaroundherereallylooklikesecrettechniques.
HoweveronlydjgppandHuman68karespeciallytreated.djgppisagccenvironmentforDOS.Human68kisanOSofSHARPX680x0Series.Inthesetwoenvironments,thewholeregistersseemtobenotsavedonlybytheordinarysetjmp(),setjmp()isredefinedasfollowsasaninline-assemblertoexplicitlywriteouttheregisters.
▼theoriginalversionofsetjmp
1072#ifdef__GNUC__1073#ifdefined(__human68k__)||defined(DJGPP)1074#ifdefined(__human68k__)1075typedefunsignedlongrb_jmp_buf[8];1076__asm__(".even\n\2-bytealignment1077_rb_setjmp:\n\thelabelofrb_setjmp()function1078move.l4(sp),a0\n\loadthefirstargumenttothea0register1079movem.ld3-d7/a3-a5,(a0)\n\copytheregisterstowherea0pointsto1080moveq.l#0,d0\n\set0tod0(asthereturnvalue)1081rts");return1082#ifdefsetjmp1083#undefsetjmp1084#endif1085#else1086#ifdefined(DJGPP)1087typedefunsignedlongrb_jmp_buf[6];1088__asm__(".align4\n\order4-bytealignment1089_rb_setjmp:\n\thelabelforrb_setjmp()function1090pushl%ebp\n\pushebptothestack
1091movl%esp,%ebp\n\setthestackpointertoebp1092movl8(%ebp),%ebp\n\pickupthefirstargumentandsettoebp1093movl%eax,(%ebp)\n\inthefollowings,storeeachregister1094movl%ebx,4(%ebp)\n\towhereebppointsto1095movl%ecx,8(%ebp)\n\1096movl%edx,12(%ebp)\n\1097movl%esi,16(%ebp)\n\1098movl%edi,20(%ebp)\n\1099popl%ebp\n\restoreebpfromthestack1100xorl%eax,%eax\n\set0toeax(asthereturnvalue)1101ret");return1102#endif1103#endif1104intrb_setjmp(rb_jmp_buf);1105#definejmp_bufrb_jmp_buf1106#definesetjmprb_setjmp1107#endif/*__human68k__orDJGPP*/1108#endif/*__GNUC__*/
(gc.c)
Alignmentistheconstraintwhenputtingvariablesonmemories.Forexample,in32-bitmachineintisusually32bits,butwecannotalwaystake32bitsfromanywhereofmemories.Particularly,RISCmachinehasstrictconstraints,itisdecidedlike“fromamultipleof4byte”or“fromevenbyte”.Whentherearesuchconstraints,memoryaccessunitcanbemoresimplified(thus,itcanbefaster).Whenthere’stheconstraintof“fromamultipleof4byte”,itiscalled“4-bytealignment”.
Plus,inccofdjgpporHuman68k,there’sarulethatthecompilerputtheunderlinetotheheadofeachfunctionname.Therefore,whenwritingaCfunctioninAssembler,weneedtoputtheunderline(_)toitsheadbyourselves.Thistypeofconstraintsaretechniquesinordertoavoidtheconflictsinnameswithlibrary
functions.AlsoinUNIX,itissaidthattheunderlinehadbeenattachedbysometimeago,butitalmostdisappearsnow.
Now,thecontentoftheregistershasbeenabletobewrittenoutintojmp_buf,itwillbemarkedinthenextcode:
▼marktheregisters(shownagain)
1151mark_locations_array((VALUE*)save_regs_gc_mark,sizeof(save_regs_gc_mark)/sizeof(VALUE*));
(gc.c)
Thisisthefirsttimethatmark_locations_array()appears.I’lldescribeitinthenextsection.
mark_locations_array()
▼mark_locations_array()
500staticvoid501mark_locations_array(x,n)502registerVALUE*x;503registerlongn;504{505while(n--){506if(is_pointer_to_heap((void*)*x)){507rb_gc_mark(*x);508}509x++;510}511}
(gc.c)
Thisfunctionistomarktheallelementsofanarray,butitslightlydiffersfromthepreviousmarkfunctions.Untilnow,eachplacetobemarkediswhereweknowitsurelyholdsaVALUE(apointertoanobject).Howeverthistime,whereitattemptstomarkistheregisterspace,itisenoughtoexpectthatthere’realsowhatarenotVALUE.Tocounterthat,ittriestodetectwhetherornotthevalueisaVALUE(apointer),thenifitseems,thevaluewillbehandledasapointer.Thiskindofmethodsarecalled“conservativeGC”.Itseemsthatitisconservativebecauseit“tentativelyinclinesthingstothesafeside”
Next,we’lllookatthefunctiontocheckif“itlookslikeaVALUE”,itisis_pointer_to_heap().
is_pointer_to_heap()
▼is_pointer_to_heap()
480staticinlineint481is_pointer_to_heap(ptr)482void*ptr;483{484registerRVALUE*p=RANY(ptr);485registerRVALUE*heap_org;486registerlongi;487488if(p<lomem||p>himem)returnQfalse;489490/*checkifthere'sthepossibilitythatpisapointer*/491for(i=0;i<heaps_used;i++){492heap_org=heaps[i];493if(heap_org<=p&&p<heap_org+heaps_limits[i]&&494((((char*)p)-((char*)heap_org))%sizeof(RVALUE))==0)
495returnQtrue;496}497returnQfalse;498}
(gc.c)
IfIbrieflyexplainit,itwouldlooklikethefollowings:
checkifitisinbetweenthetopandthebottomoftheaddresseswhereRVALUEsreside.checkifitisintherangeofaheapmakesurethevaluepointstotheheadofaRVALUE.
Sincethemechanismislikethis,it’sobviouslypossiblethatanon-VALUEvalueismistakenlyhandledasaVALUE.Butatleast,itwillneverfailtofindouttheusedVALUEs.And,withthisamountoftests,itmayrarelypickupanon-VALUEvalueunlessitintentionallydoes.Therefore,consideringaboutthebenefitswecanobtainbyGC,it’ssufficienttocompromise.
RegisterWindowThissectionisaboutFLUSH_REGISTER_WINDOWS()whichhasbeendeferred.
RegisterwindowsarethemechanismtoenabletoputapartofthemachinestackintoinsidetheCPU.Inshort,itisacachewhosepurposeofuseisnarroweddown.Recently,itexistsonlyinSparcarchitecture.It’spossiblethattherearealsoVALUEsinregister
windows,andit’salsonecessarytogetdownthemintomemory.
Thecontentofthemacroislikethis:
▼FLUSH_REGISTER_WINDOWS
125#ifdefined(sparc)||defined(__sparc__)126#ifdefined(linux)||defined(__linux__)127#defineFLUSH_REGISTER_WINDOWSasm("ta0x83")128#else/*Solaris,notsparclinux*/129#defineFLUSH_REGISTER_WINDOWSasm("ta0x03")130#endif131#else/*Notasparc*/132#defineFLUSH_REGISTER_WINDOWS133#endif
(defines.h)
asm(...)isabuilt-inassembler.However,eventhoughIcallitassembler,thisinstructionnamedtaisthecallofaprivilegedinstruction.Inotherwords,thecallisnotoftheCPUbutoftheOS.That’swhytheinstructionisdifferentforeachOS.ThecommentsdescribeonlyaboutLinuxandSolaris,butactuallyFreeBSDandNetBSDarealsoworksonSparc,sothiscommentiswrong.
Plus,ifitisnotSparc,itisunnecessarytoflush,thusFLUSH_REGISTER_WINDOWSisdefinedasnothing.Likethis,themethodtogetamacrobacktonothingisveryfamoustechniquethatisalsoconvenientwhendebugging.
MachineStack
Then,let’sgobacktotherestofrb_gc().Thistime,itmarksVALUESsinthemachinestack.
▼markthemachinestack
1152rb_gc_mark_locations(rb_gc_stack_start,(VALUE*)STACK_END);1153#ifdefined(__human68k__)1154rb_gc_mark_locations((VALUE*)((char*)rb_gc_stack_start+2),1155(VALUE*)((char*)STACK_END+2));1156#endif
(gc.c)
rb_gc_stack_startseemsthestartaddress(theendofthestack)andSTACK_ENDseemstheendaddress(thetop).And,rb_gc_mark_locations()practicallymarksthestackspace.
Therearerb_gc_mark_locations()twotimesinordertodealwiththearchitectureswhicharenot4-bytealignment.rb_gc_mark_locations()triestomarkforeachportionofsizeof(VALUE),soifitisin2-bytealignmentenvironment,sometimesnotbeabletoproperlymark.Inthiscase,itmovestherange2bytesthenmarksagain.
Now,rb_gc_stack_start,STACK_END,rb_gc_mark_locations(),let’sexaminethesethreeinthisorder.
Init_stack()
Thefirstthingisrb_gc_starck_start.ThisvariableissetonlyduringInit_stack().AsthenameInit_mightsuggest,thisfunctionis
calledatthetimewheninitializingtherubyinterpretor.
▼Init_stack()
1193void1194Init_stack(addr)1195VALUE*addr;1196{1197#ifdefined(__human68k__)1198externvoid*_SEND;1199rb_gc_stack_start=_SEND;1200#else1201VALUEstart;12021203if(!addr)addr=&start;1204rb_gc_stack_start=addr;1205#endif1206#ifdefHAVE_GETRLIMIT1207{1208structrlimitrlim;12091210if(getrlimit(RLIMIT_STACK,&rlim)==0){1211doublespace=(double)rlim.rlim_cur*0.2;12121213if(space>1024*1024)space=1024*1024;1214STACK_LEVEL_MAX=(rlim.rlim_cur-space)/sizeof(VALUE);1215}1216}1217#endif1218}
(gc.c)
Whatisimportantisonlythepartinthemiddle.Itdefinesanarbitrarylocalvariable(itisallocatedonthestack)anditsetsitsaddresstorb_gc_stack_start.The_SENDinsidethecodefor__human68k__isprobablythevariabledefinedbyalibraryof
compilerorsystem.Naturally,youcanpresumethatitisthecontractionofStackEND.
Meanwhile,thecodeafterthatbundledbyHAVE_GETRLIMITappearstocheckthelengthofthestackanddomysteriousthings.Thisisalsointhesamecontextofwhatisdoneatrb_gc_mark_children()topreventthestackoverflow.Wecanignorethis.
STACK_END
Next,we’lllookattheSTACK_ENDwhichisthemacrotodetecttheendofthestack.
▼STACK_END
345#ifdefC_ALLOCA346#defineSET_STACK_ENDVALUEstack_end;alloca(0);347#defineSTACK_END(&stack_end)348#else349#ifdefined(__GNUC__)&&defined(USE_BUILTIN_FRAME_ADDRESS)350#defineSET_STACK_ENDVALUE*stack_end=__builtin_frame_address(0)351#else352#defineSET_STACK_ENDVALUE*stack_end=alloca(1)353#endif354#defineSTACK_END(stack_end)355#endif
(gc.c)
AstherearethreevariationsofSET_STACK_END,let’sstartwiththebottomone.alloca()allocatesaspaceattheendofthestackandreturnsit,sothereturnvalueandtheendaddressofthestackshouldbeveryclose.Hence,itconsidersthereturnvalueof
alloca()asanapproximatevalueoftheendofthestack.
Let’sgobackandlookattheoneatthetop.WhenthemacroC_ALLOCAisdefined,alloca()isnotnativelydefined,…inotherwords,itindicatesacompatiblefunctionisdefinedinC.Imentionedthatinthiscasealloca()internallyallocatesmemorybyusingmalloc().However,itdoesnothelptogetthepositionofthestackatall.Todealwiththissituation,itdeterminesthatthelocalvariablestack_endofthecurrentlyexecutingfunctionisclosetotheendofthestackandusesitsaddress(&stack_end).
Plus,thiscodecontainsalloca(0)whosepurposeisnoteasytosee.Thishasbeenafeatureofthealloca()definedinCsinceearlytimes,anditmeans“pleasecheckandfreetheunusedspace”.SincethisisusedwhendoingGC,itattemptstofreethememoryallocatedwithalloca()atthesametime.ButIthinkit’sbettertoputitinanothermacroinsteadofmixingintosuchplace…
Andatlast,inthemiddlecase,itisabout__builtin_frame_address().__GNUC__isasymboldefinedingcc(thecompilerofGNUC).Sincethisisusedtolimit,itisabuilt-ininstructionofgcc.Youcangettheaddressofthen-timespreviousstackframewith__builtin_frame_address(n).Asfor__builtin_frame_adress(0),itprovidestheaddressofthecurrentframe.
rb_gc_mark_locations()
Thelastoneistherb_gc_mark_locations()functionthatactually
marksthestack.
▼rb_gc_mark_locations()
513void514rb_gc_mark_locations(start,end)515VALUE*start,*end;516{517VALUE*tmp;518longn;519520if(start>end){521tmp=start;522start=end;523end=tmp;524}525n=end-start+1;526mark_locations_array(start,n);527}
(gc.c)
Basically,delegatingtothefunctionmark_locations_array()whichmarksaspaceissufficient.Whatthisfunctiondoesisproperlyadjustingthearguments.Suchadjustmentisrequiredbecauseinwhichdirectionthemachinestackextendsisundecided.Ifthemachinestackextendstoloweraddresses,endissmaller,ifitextendstohigheraddresses,startissmaller.Therefore,sothatthesmalleronebecomesstart,theyareadjustedhere.
TheotherrootobjectsFinally,itmarksthebuilt-inVALUEcontainersoftheinterpretor.
▼Theotherroots
1159/*marktheregisteredglobalvariables*/1160for(list=global_List;list;list=list->next){1161rb_gc_mark(*list->varptr);1162}1163rb_mark_end_proc();1164rb_gc_mark_global_tbl();11651166rb_mark_tbl(rb_class_tbl);1167rb_gc_mark_trap_list();11681169/*marktheinstancevariablesoftrue,false,etcifexist*/1170rb_mark_generic_ivar_tbl();1171/*markthevariablesusedintherubyparser(onlywhileparsing)*/1172rb_gc_mark_parser();
(gc.c)
WhenputtingaVALUEintoaglobalvariableofC,itisrequiredtoregisteritsaddressbyuserviarb_gc_register_address().Astheseobjectsaresavedinglobal_List,allofthemaremarked.
rb_mark_end_proc()istomarktheproceduralobjectswhichareregisteredviakindofENDstatementofRubyandexecutedwhenaprogramfinishes.(ENDstatementswillnotbedescribedinthisbook).
rb_gc_mark_global_tbl()istomarktheglobalvariabletablerb_global_tbl.(Seealsothenextchapter“VariablesandConstants”)
rb_mark_tbl(rb_class_tbl)istomarkrb_class_tblwhichwasdiscussedinthepreviouschapter.
rb_gc_mark_trap_list()istomarktheproceduralobjectswhichareregisteredviatheRuby’sfunction-likemethodtrap.(Thisisrelatedtosignalsandwillalsonotbedescribedinthisbook.)
rb_mark_generic_ivar_tbl()istomarktheinstancevariabletablepreparedfornon-pointerVALUEsuchastrue.
rb_gc_mark_parser()istomarkthesemanticstackoftheparser.(ThesemanticstackwillbedescribedinPart2.)
Untilhere,themarkphasehasbeenfinished.
Sweep
ThespecialtreatmentforNODEThesweepphaseistheprocedurestofindoutandfreethenot-markedobjects.But,forsomereason,theobjectsoftypeT_NODEarespeciallytreated.Takealookatthenextpart:
▼atthebegginingofgc_sweep()
846staticvoid847gc_sweep()848{849RVALUE*p,*pend,*final_list;850intfreed=0;851inti,used=heaps_used;
852853if(ruby_in_compile&&ruby_parser_stack_on_heap()){854/*Iftheyaccstackisnotonthemachinestack,855donotcollectNODEwhileparsing*/856for(i=0;i<used;i++){857p=heaps[i];pend=p+heaps_limits[i];858while(p<pend){859if(!(p->as.basic.flags&FL_MARK)&&BUILTIN_TYPE(p)==T_NODE)860rb_gc_mark((VALUE)p);861p++;862}863}864}
(gc.c)
NODEisaobjecttoexpressaprogramintheparser.NODEisputonthestackpreparedbyatoolnamedyaccwhilecompiling,butthatstackisnotalwaysonthemachinestack.Concretelyspeaking,whenruby_parser_stack_on_heap()isfalse,itindicatesitisnotonthemachinestack.Inthiscase,aNODEcouldbeaccidentallycollectedinthemiddleofitscreation,thustheobjectsoftypeT_NODEareunconditionallymarkedandprotectedfrombeingcollectedwhilecompiling(ruby_in_compile).
FinalizerAfterithasreachedhere,allnot-markedobjectscanbefreed.However,there’sonethingtodobeforefreeing.InRubythefreeingofobjectscanbehooked,anditisnecessarytocallthem.Thishookiscalled“finalizer”.
▼gc_sweep()Middle
869freelist=0;870final_list=deferred_final_list;871deferred_final_list=0;872for(i=0;i<used;i++){873intn=0;874875p=heaps[i];pend=p+heaps_limits[i];876while(p<pend){877if(!(p->as.basic.flags&FL_MARK)){878(A)if(p->as.basic.flags){879obj_free((VALUE)p);880}881(B)if(need_call_final&&FL_TEST(p,FL_FINALIZE)){882p->as.free.flags=FL_MARK;/*remainsmarked*/883p->as.free.next=final_list;884final_list=p;885}886else{887p->as.free.flags=0;888p->as.free.next=freelist;889freelist=p;890}891n++;892}893(C)elseif(RBASIC(p)->flags==FL_MARK){894/*theobjectsthatneedtofinalize*/895/*areleftuntouched*/896}897else{898RBASIC(p)->flags&=~FL_MARK;899}900p++;901}902freed+=n;903}904if(freed<FREE_MIN){905add_heap();906}907during_gc=0;
(gc.c)
Thischecksallovertheobjectheapfromtheedge,andfreestheobjectonwhichFL_MARKflagisnotsetbyusingobj_free()(A).obj_free()frees,forinstance,onlychar[]usedbyStringobjectsorVALUE[]usedbyArrayobjects,butitdoesnotfreetheRVALUEstructanddoesnottouchbasic.flagsatall.Therefore,ifastructismanipulatedafterobj_free()iscalled,there’snoworryaboutgoingdown.
Afteritfreestheobjects,itbranchesbasedonFL_FINALIZEflag(B).IfFL_FINALIZEissetonanobject,sinceitmeansatleastafinalizerisdefinedontheobject,theobjectisaddedtofinal_list.Otherwise,theobjectisimmediatelyaddedtofreelist.Whenfinalizing,basic.flagsbecomesFL_MARK.Thestruct-typeflag(suchasT_STRING)isclearedbecauseofthis,andtheobjectcanbedistinguishedfromaliveobjects.
Then,thisphasecompletesbyexecutingtheallfinalizers.Noticethatthehookedobjectshavealreadydiedwhencallingthefinalizers.Itmeansthatwhileexecutingthefinalizers,onecannotusethehookedobjects.
▼gc_sweep()therest
910if(final_list){911RVALUE*tmp;912913if(rb_prohibit_interrupt||ruby_in_compile){914deferred_final_list=final_list;
915return;916}917918for(p=final_list;p;p=tmp){919tmp=p->as.free.next;920run_final((VALUE)p);921p->as.free.flags=0;922p->as.free.next=freelist;923freelist=p;924}925}926}
(gc.c)
Theforinthelasthalfisthemainfinalizingprocedure.TheifinthefirsthalfisthecasewhentheexecutioncouldnotbemovedtotheRubyprogramforvariousreasons.Theobjectswhosefinalizationisdeferredwillbeappearintheroute(C)ofthepreviouslist.
rb_gc_force_recycle()
I’lltalkaboutalittledifferentthingattheend.Untilnow,theruby‘sgarbagecollectordecideswhetherornotitcollectseachobject,butthere’salsoawaythatusersexplicitlyletitcollectaparticularobject.It’srb_gc_force_recycle().
▼rb_gc_force_recycle()
928void929rb_gc_force_recycle(p)930VALUEp;931{932RANY(p)->as.free.flags=0;
933RANY(p)->as.free.next=freelist;934freelist=RANY(p);935}
(gc.c)
Itsmechanismisnotsospecial,butIintroducedthisbecauseyou’llseeitseveraltimesinPart2andPart3.
Discussions
TofreespacesThespaceallocatedbyanindividualobject,say,char[]ofString,isfreedduringthesweepphase,butthecodetofreetheRVALUEstructitselfhasnotappearedyet.And,theobjectheapalsodoesnotmanagethenumberofstructsinuseandsuch.Thismeansthatiftheruby’sobjectspaceisonceallocateditwouldneverbefreed.
Forexample,themailerwhatI’mcreatingnowtemporarilyusesthespacealmost40Mbyteswhenconstructingthethreadsfor500mails,butifmostofthespacebecomesunusedastheconsequenceofGCitwillkeepoccupyingthe40Mbytes.Becausemymachineisalsokindofmodern,itdoesnotmatterifjustthe40Mbytesareused.But,ifthisoccursinaserverwhichkeepsrunning,there’sthepossibilityofbecomingaproblem.
However,onealsoneedtoconsiderthatfree()doesnotalways
meanthedecreaseoftheamountofmemoryinuse.IfitdoesnotreturnmemorytoOS,theamountofmemoryinuseoftheprocessneverdecrease.And,dependingontheimplementationofmalloc(),althoughdoingfree()itoftendoesnotcausereturningmemorytoOS.
…Ihadwrittenso,butjustbeforethedeadlineofthisbook,RVALUEbecametobefreed.TheattachedCD-ROMalsocontainstheedgeruby,sopleasecheckbydiff.…whatasadending.
GenerationalGCMark&Sweephasanweakpoint,itis“itneedstotouchtheentireobjectspaceatleastonce”.There’sthepossibilitythatusingtheideaofGenerationalGCcanmakeupfortheweakpoint.
ThefundamentalofGenerationalGCistheexperientialrulethat“Mostobjectsarelastingforeitherverylongorveryshorttime”.Youmaybeconvincedaboutthispointbythinkingforsecondsabouttheprogramsyouwrite.
Then,thinkingbasedonthisrule,onemaycomeupwiththeideathat“long-livedobjectsdonotneedtobemarkedorswepteachandeverytime”.Onceanobjectisthoughtthatitwillbelong-lived,itistreatedspeciallyandexcludedfromtheGCtarget.Then,forbothmarkingandsweeping,itcansignificantlydecreasethenumberoftargetobjects.Forexample,ifhalfoftheobjectsarelong-livedataparticularGCtime,thenumberofthetargetobjects
ishalf.
There’saproblem,though.GenerationalGCisverydifficulttodoifobjectscan’tbemoved.Itisbecausethelong-livedobjectsare,asIjustwrote,neededto“betreatedspecially”.SincegenerationalGCdecreasesthenumberoftheobjectsdealtwithandreducesthecost,ifwhichgenerationaobjectbelongstoisnotclearlycategorized,asaconsequenceitisequivalenttodealingwithbothgenerations.Furthermore,theruby’sGCisalsoaconservativeGC,soitalsohastobecreatedsothatis_pointer_to_heap()work.Thisisparticularlydifficult.
Howtosolvethisproblemis…BythehandofMr.KiyamaMasato,theimplementationofGenerationalGCforrubyhasbeenpublished.I’llbrieflydescribehowthispatchdealswitheachproblem.Andthistime,bycourtesyofMr.Kiyama,thisGenerationalGCpatchanditspaperarecontainedinattachedCD-ROM.(Seealsodoc/generational-gc.html)
Then,Ishallstarttheexplanation.Inordertoeaseexplaining,fromnowon,thelong-livedobjectsarecalledas“old-generationobjects”,theshort-livedobjectsarecalledas“new-generationobjects”,
First,aboutthebiggestproblemwhichisthespecialtreatmentfortheold-generationobjects.Thispointisresolvedbylinkingonlythenew-generationobjectsintoalistnamednewlist.ThislistissubstantializedbyincreasingRVALUE’selements.
Second,aboutthewaytodetecttheold-generationobjects.Itisverysimplydonebyjustremovingthenewlistobjectswhichwerenotgarbagecollectedfromthenewlist.Inotherwords,onceanobjectsurvivesthroughGC,itwillbetreatedasanold-generationobject.
Third,aboutthewaytodetectthereferencesfromold-generationobjectstonew-generationobjects.InGenerationalGC,it’ssortof,theold-generationobjectskeepbeinginthemarkedstate.However,whentherearelinksfromold-generationtonew-generation,thenew-generationobjectswillnotbemarked.(Figure11)
Figure11:referenceovergenerations
Thisisnotgood,soatthemomentwhenanold-generationalobjectreferstoanew-generationalobject,thenew-generationalobjectmustbeturnedintoold-generational.Thepatchmodifiesthe
librariesandaddscheckstowherethere’spossibilitythatthiskindofreferenceshappens.
Thisistheoutlineofitsmechanism.Itwasscheduledthatthispatchisincludedruby1.7,butithasnotbeenincludedyet.Itissaidthatthereasonisitsspeed,There’saninferencethatthecostofthethirdpoint“checkallreferences”matters,buttheprecisecausehasnotfiguredout.
CompactionCouldtheruby’sGCdocompaction?SinceVALUEofrubyisadirectpointertoastruct,iftheaddressofthestructarechangedbecauseofcompaction,itisnecessarytochangetheallVALUEsthatpointtothemovedstructs.
However,sincetheruby’sGCisaconservativeGC,“thecasewhenitisimpossibletodeterminewhetherornotitisreallyaVALUE”ispossible.Changingthevalueeventhoughinthissituation,ifitwasnotVALUEsomethingawfulwillhappen.CompactionandconservativeGCarereallyincompatible.
But,let’scontrivecountermeasuresinonewayoranother.ThefirstwayistoletVALUEbeanobjectIDinsteadofapointer.(Figure12)ItmeanssandwichingaindirectlayerbetweenVALUEandastruct.Inthisway,asit’snotnecessarytorewriteVALUE,structscanbesafelymoved.Butastrade-offs,accessingspeedslowsdownandthecompatibilityofextensionlibrariesislost.
Figure12:referencethroughtheobjectID
Then,thenextwayistoallowmovingthestructonlywhentheyarepointedfromonlythepointersthat“issurelyVALUE”(Figure13).ThismethodiscalledMostly-copyinggarbagecollection.Intheordinaryprograms,therearenotsomanyobjectsthatis_pointer_to_heap()istrue,sotheprobabilityofbeingabletomovetheobjectstructsisquitehigh.
Figure13:Mostly-copyinggarbagecollection
Moreoverandmoreover,byenablingtomovethestruct,theimplementationofGenerationalGCbecomessimpleatthesametime.Itseemstobeworthtochallenge.
volatiletoprotectfromGCIwrotethatGCtakescareofVALUEonthestack,thereforeifaVALUEislocatedasalocalvariabletheVALUEshouldcertainlybemarked.Butinrealityduetotheeffectsofoptimization,it’spossiblethatthevariablesdisappear.Forexample,there’sapossibilityofdisappearinginthefollowingcase:
VALUEstr;str=rb_str_new2("...");printf("%s\n",RSTRING(str)->ptr);
Becausethiscodedoesnotaccessthestritself,somecompilers
onlykeepsstr->ptrinmemoryanddeletesthestr.Ifthishappened,thestrwouldbecollectedandtheprocesswouldbedown.There’snochoiceinthiscase
volatileVALUEstr;
weneedtowritethisway.volatileisareservedwordofC,andithasaneffectofforbiddingoptimizationsthathavetodowiththisvariable.IfvolatilewasattachedinthecoderelatestoRuby,youcouldassumealmostcertainlythatitsexistsforGC.WhenIreadK&R,Ithought“whatistheuseofthis?”,andtotallydidn’texpecttoseetheplentyoftheminruby.
Consideringtheseaspects,thepromiseoftheconservativeGC“usersdon’thavetocareaboutGC”seemsnotalwaystrue.Therewasonceadiscussionthat“theScheme’sGCnamedKSMdoesnotneedvolatile”,butitseemsitcouldnotbeappliedtorubybecauseitsalgorithmhasahole.
Whentoinvoke
Insidegc.cWhentoinvokeGC?Insidegc.c,therearethreeplacescallingrb_gc()insideofgc.c,
ruby_xmalloc()
ruby_xrealloc()
rb_newobj()
Asforruby_xmalloc()andruby_xrealloc(),itiswhenfailingtoallocatememory.DoingGCmayfreememoriesandit’spossiblethataspacebecomesavailableagain.rb_newobj()hasasimilarsituation,itinvokeswhenfreelistbecomesempty.
InsidetheinterpritorThere’sseveralplacesexceptforgc.cwherecallingrb_gc()intheinterpretor.
First,inio.canddir.c,whenitrunsoutoffiledescriptorsandcouldnotopen,itinvokesGC.IfIOobjectsaregarbagecollected,it’spossiblethatthefilesareclosedandfiledescriptorsbecomeavailable.
Inruby.c,rb_gc()issometimesdoneafterloadingafile.AsImentionedinthepreviousSweepsection,itistocompensateforthefactthatNODEcannotbegarbagecollectedwhilecompiling.
ObjectCreation
We’vefinishedaboutGCandcometobeabletodealwiththeRubyobjectsfromitscreationtoitsfreeing.SoI’dliketodescribeaboutobjectcreationshere.ThisisnotsorelatedtoGC,rather,itisrelatedalittletothediscussionaboutclassesinthepreviouschapter.
AllocationFrameworkWe’vecreatedobjectsmanytimes.Forexample,inthisway:
classCendC.new()
Atthistime,howdoesC.newcreateaobject?
First,C.newisactuallyClass#new.Itsactualbodyisthis:
▼rb_class_new_instance()
725VALUE726rb_class_new_instance(argc,argv,klass)727intargc;728VALUE*argv;729VALUEklass;730{731VALUEobj;732733obj=rb_obj_alloc(klass);734rb_obj_call_init(obj,argc,argv);735736returnobj;737}
(object.c)
rb_obj_alloc()callstheallocatemethodagainsttheklass.Inotherwords,itcallsC.allocateinthisexamplecurrentlyexplained.ItisClass#allocatebydefaultanditsactualbodyisrb_class_allocate_instance().
▼rb_class_allocate_instance()
708staticVALUE709rb_class_allocate_instance(klass)710VALUEklass;711{712if(FL_TEST(klass,FL_SINGLETON)){713rb_raise(rb_eTypeError,"can'tcreateinstanceofvirtualclass");714}715if(rb_frame_last_func()!=alloc){716returnrb_obj_alloc(klass);717}718else{719NEWOBJ(obj,structRObject);720OBJSETUP(obj,klass,T_OBJECT);721return(VALUE)obj;722}723}
(object.c)
rb_newobj()isafunctionthatreturnsaRVALUEbytakingfromthefreelist.NEWOBJ()isjustarb_newobj()withtype-casting.TheOBJSETUP()isamacrotoinitializethestructRBasicpart,youcanthinkthatthisexistsonlyinordernottoforgettosettheFL_TAINTflag.
Therestisgoingbacktorb_class_new_instance(),thenitcallsrb_obj_call_init().Thisfunctioncallsinitializeonthejustcreatedobject,andtheinitializationcompletes.
Thisissummarizedasfollows:
SomeClass.new=Class#new(rb_class_new_instance)SomeClass.allocate=Class#allocate(rb_class_allocate_instance)SomeClass#initialize=Object#initialize(rb_obj_dummy)
Icouldsaythattheallocateclassmethodistophysicallyinitialize,theinitializeistologicallyinitialize.Themechanismlikethis,inotherwordsthemechanismthatanobjectcreationisdividedintoallocate/initializeandnewpresidesthem,iscalledthe“allocationframework”.
CreatingUserDefinedObjectsNext,we’llexamineabouttheinstancecreationsoftheclassesdefinedinextensionlibraries.Asitiscalleduser-defined,itsstructisnotdecided,withouttellinghowtoallocateit,rubydon’tunderstandhowtocreateitsobject.Let’slookathowtotellit.
Data_Wrap_Struct()
Whicheveritisuser-definedornot,itscreationmechanismitselfcanfollowtheallocationframework.ItmeansthatwhendefininganewSomeClassclassinC,weoverwritebothSomeClass.allocateandSomeClass#initialize.
Let’slookattheallocatesidefirst.Here,itdoesthephysicalinitialization.Whatisnecessarytoallocate?Imentionedthattheinstanceoftheuser-definedclassisapairofstructRDataandauser-preparedstruct.We’llassumethatthestructisoftypestructmy.InordertocreateaVALUEbasedonthestructmy,youcanuseData_Wrap_Struct().Thisishowtouse:
structmy*ptr=malloc(sizeof(structmy));/*arbitrarilyallocateintheheap*/VALUEval=Data_Wrap_Struct(data_class,mark_f,free_f,ptr);
data_classistheclassthatvalbelongsto,ptristhepointertobewrapped.mark_fis(thepointerto)thefunctiontomarkthisstruct.However,thisdoesnotmarktheptritselfandisusedwhenthestructpointedbyptrcontainsVALUE.Ontheotherhand,free_fisthefunctiontofreetheptritself.Theargumentofthebothfunctionsisptr.Goingbackalittleandreadingthecodetomarkmayhelpyoutounderstandthingsaroundhereinoneshot.
Let’salsolookatthecontentofData_Wrap_Struct().
▼Data_Wrap_Struct()
369#defineData_Wrap_Struct(klass,mark,free,sval)\370rb_data_object_alloc(klass,sval,\(RUBY_DATA_FUNC)mark,\(RUBY_DATA_FUNC)free)
365typedefvoid(*RUBY_DATA_FUNC)_((void*));
(ruby.h)
Mostofitisdelegatedtorb_object_alloc().
▼rb_data_object_alloc()
310VALUE311rb_data_object_alloc(klass,datap,dmark,dfree)312VALUEklass;313void*datap;314RUBY_DATA_FUNCdmark;315RUBY_DATA_FUNCdfree;316{317NEWOBJ(data,structRData);318OBJSETUP(data,klass,T_DATA);319data->data=datap;320data->dfree=dfree;321data->dmark=dmark;322323return(VALUE)data;324}
(gc.c)
Thisisnotcomplicated.Asthesameastheordinaryobjects,itpreparesaRVALUEbyusingNEWOBJ()OBJSETUP(),andsetsthemembers.
Here,let’sgobacktoallocate.We’vesucceededtocreateaVALUEbynow,sotherestisputtingitinanarbitraryfunctionanddefiningthefunctiononaclassbyrb_define_singleton_method().
Data_Get_Struct()
Thenextthingisinitialize.Notonlyforinitialize,themethodsneedawaytopulloutthestructmy*fromthepreviouslycreated
VALUE.Inordertodoit,youcanusetheData_Get_Struct()macro.
▼Data_Get_Struct()
378#defineData_Get_Struct(obj,type,sval)do{\379Check_Type(obj,T_DATA);\380sval=(type*)DATA_PTR(obj);\381}while(0)
360#defineDATA_PTR(dta)(RDATA(dta)->data)
(ruby.h)
Asyousee,itjusttakesthepointer(tostructmy)fromamemberofRData.Thisissimple.Check_Type()justchecksthestructtype.
TheIssuesoftheAllocationFrameworkSo,I’veexplainedinnocentlyuntilnow,butactuallythecurrentallocationframeworkhasafatalissue.Ijustdescribedthattheobjectcreatedwithallocateappearstotheinitializeortheothermethods,butifthepassedobjectthatwascreatedwithallocateisnotofthesameclass,itmustbeaveryseriousproblem.Forexample,iftheobjectcreatedwiththedefaultObjct.allocate(Class#allocate)ispassedtothemethodofString,thiscauseaseriousproblem.ThatisbecauseeventhoughthemethodsofStringarewrittenbasedontheassumptionthatastructoftypestructRStringisgiven,thegivenobjectisactuallyastructRObject.Inordertoavoidsuchsituation,theobjectcreatedwithC.allocatemustbepassedonlytothemethodsofCoritssubclasses.
Ofcourse,thisisalwaystruewhenthingsareordinarilydone.AsC.allocatecreatestheinstanceoftheclassC,itisnotpassedtothemethodsoftheotherclasses.Asanexception,itispossiblethatitispassedtothemethodofObject,butthemethodsofObjectdoesnotdependonthestructtype.
However,whatifitisnotordinarilydone?SinceC.allocateisexposedattheRubylevel,thoughI’venotdescribedaboutthemyet,bymakinguseofaliasorsuperorsomething,thedefinitionofallocatecanbemovedtoanotherclass.Inthisway,youcancreateanobjectwhoseclassisStringbutwhoseactualstructtypeisstructRObject.ItmeansthatyoucanfreelyletrubydownfromtheRubylevel.Thisisaproblem.
ThesourceoftheissueisthatallocateisexposedtotheRubylevelasamethod.Converselyspeaking,asolutionistodefinethecontentofallocateontheclassbyusingawaythatisanythingbutamethod.So,
rb_define_allocator(rb_cMy,my_allocate);
analternativelikethisiscurrentlyindiscussion.
TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera
CreativeCommonsAttribution-NonCommercial-ShareAlike2.5
License
RubyHackingGuide
TranslatedbyVincentISAMBART
Chapter6:Variables
andconstants
Outlineofthischapter
RubyvariablesInRubytherearequitealotofdifferenttypesofvariablesandconstants.Let’slinethemup,startingfromthelargestscope.
GlobalvariablesConstantsClassvariablesInstancevariablesLocalvariables
Instancevariableswerealreadyexplainedinchapter2“Objects”.Inthischapterwe’lltalkabout:
GlobalvariablesClassvariablesConstants
Wewilltalkaboutlocalvariablesinthethirdpartofthebook.
APIforvariablesTheobjectofthischapter’sanalysisisvariable.c.LetmefirstintroducetheAPIswhichwouldbetheentrypoints.
VALUErb_iv_get(VALUEobj,char*name)VALUErb_ivar_get(VALUEobj,IDname)VALUErb_iv_set(VALUEobj,char*name,VALUEval)VALUErb_ivar_set(VALUEobj,IDname,VALUEval)
ThesearetheAPIstoaccessinstancevariableswhichhavealreadybeendescribed.Theyareshownhereagainbecausetheirdefinitionsareinvariable.c.
VALUErb_cv_get(VALUEklass,char*name)VALUErb_cvar_get(VALUEklass,IDname)VALUErb_cv_set(VALUEklass,char*name,VALUEval)VALUErb_cvar_set(VALUEklass,IDname,VALUEval)
ThesefunctionsaretheAPIforaccessingclassvariables.Classvariablesbelongdirectlytoclassessothefunctionstakeaclassasparameter.Thereareintwogroups,dependingiftheirnamestartswithrb_Xvorrb_Xvar.Thedifferenceliesinthetypeofthevariable“name”.Theoneswithashorternamearegenerallyeasiertousebecausetheytakeachar*.TheoneswithalongernamearemoreforinternaluseastheytakeaID.
VALUErb_const_get(VALUEklass,IDname)VALUErb_const_get_at(VALUEklass,IDname)
VALUErb_const_set(VALUEklass,IDname,VALUEval)
Thesefunctionsareforaccessingconstants.Constantsalsobelongtoclassessotheytakeclassesasparameter.rb_const_get()followsthesuperclasschain,whereasrb_const_get_at()doesnot(itjustlooksinklass).
structglobal_entry*rb_global_entry(IDname)VALUErb_gv_get(char*name)VALUErb_gvar_get(structglobal_entry*ent)VALUErb_gv_set(char*name,VALUEval)VALUErb_gvar_set(structglobal_entry*ent,VALUEval)
Theselastfunctionsareforaccessingglobalvariables.Theyarealittledifferentfromtheothersduetotheuseofstructglobal_entry.We’llexplainthiswhiledescribingtheimplementation.
PointsofthischapterThemostimportantpointwhentalkingaboutvariablesis“Whereandhowarevariablesstored?”,inotherwords:datastructures.
Thesecondmostimportantmatterishowwesearchforthevalues.ThescopesofRubyvariablesandconstantsarequitecomplicatedbecausevariablesandconstantsaresometimesinherited,sometimeslookedforoutsideofthelocalscope…Tohaveabetterunderstanding,youshouldthinkbycomparingtheimplementationwiththespecification,like“Itbehaveslikethisinthissituationsoitsimplementationcouldn’tbeotherthenthis!”
Classvariables
Classvariablesarevariablesthatbelongtoclasses.InJavaorC++theyarecalledstaticvariables.Theycanbeaccessedfromboththeclassoritsinstances.But“fromaninstance”or“fromtheclass”isinformationonlyavailableintheevaluator,andwedonothaveoneforthemoment.SofromtheClevelit’slikehavingnoaccessrange.We’lljustfocusonthewaythesevariablesarestored.
ReadingThefunctionstogetaclassvariablearerb_cvar_get()andrb_cv_get().ThefunctionwiththelongernametakesIDasparameterandtheonewiththeshorteronetakeschar*.BecausetheonetakinganIDseemsclosertotheinternals,we’lllookatit.
▼rb_cvar_get()
1508VALUE1509rb_cvar_get(klass,id)1510VALUEklass;1511IDid;1512{1513VALUEvalue;1514VALUEtmp;15151516tmp=klass;1517while(tmp){1518if(RCLASS(tmp)->iv_tbl){1519if(st_lookup(RCLASS(tmp)->iv_tbl,id,&value)){1520if(RTEST(ruby_verbose)){1521cvar_override_check(id,tmp);1522}
1523returnvalue;1524}1525}1526tmp=RCLASS(tmp)->super;1527}15281529rb_name_error(id,"uninitializedclassvariable%sin%s",1530rb_id2name(id),rb_class2name(klass));1531returnQnil;/*notreached*/1532}
(variable.c)
Thisfunctionreadsaclassvariableinklass.
Errormanagementfunctionslikerb_raise()canbesimplyignoredlikeIsaidbefore.Therb_name_error()thatappearsthistimeisafunctionforraisinganexception,soitcanbeignoredforthesamereasons.Inruby,youcanassumethatallfunctionsendingwith_errorraiseanexception.
Afterremovingallthis,wecanseethatitisjustfollowingtheklass‘ssuperclasschainonebyoneandsearchingineachiv_tbl.…Atthispoint,I’dlikeyoutosay“What?iv_tblistheinstancevariablestable,isn’tit?”Asamatteroffact,classvariablesarestoredintheinstancevariabletable.
WecandothisbecausewhencreatingIDs,thewholenameofthevariablesistakenintoaccount,includingtheprefix:rb_intern()willreturndifferentIDsfor“@var”and“@@var”.AttheRubylevel,thevariabletypeisdeterminedonlybytheprefixsothere’snowaytoaccessaclassvariablecalled@varfromRuby.
Constants
It’salittleabruptbutI’dlikeyoutorememberthemembersofstructRClass.Ifweexcludethebasicmember,structRClasscontains:
VALUEsuper
structst_table*iv_tbl
structst_table*m_tbl
Then,consideringthat:
1. constantsbelongtoaclass2. wecan’tseeanytablededicatedtoconstantsinstructRClass3. classvariablesandinstancevariablesarebothiniv_tbl
Coulditmeanthattheconstantsarealso…
Assignmentrb_const_set()isafunctiontosetthevalueofconstants:itsetstheconstantidintheclassklasstothevalueval.
▼rb_const_set()
1377void1378rb_const_set(klass,id,val)1379VALUEklass;1380IDid;1381VALUEval;
1382{1383mod_av_set(klass,id,val,Qtrue);1384}
(variable.c)
mod_av_set()doesallthehardwork:
▼mod_av_set()
1352staticvoid1353mod_av_set(klass,id,val,isconst)1354VALUEklass;1355IDid;1356VALUEval;1357intisconst;1358{1359char*dest=isconst?"constant":"classvariable";13601361if(!OBJ_TAINTED(klass)&&rb_safe_level()>=4)1362rb_raise(rb_eSecurityError,"Insecure:can'tset%s",dest);1363if(OBJ_FROZEN(klass))rb_error_frozen("class/module");1364if(!RCLASS(klass)->iv_tbl){1365RCLASS(klass)->iv_tbl=st_init_numtable();1366}1367elseif(isconst){1368if(st_lookup(RCLASS(klass)->iv_tbl,id,0)||1369(klass==rb_cObject&&st_lookup(rb_class_tbl,id,0))){1370rb_warn("alreadyinitialized%s%s",dest,rb_id2name(id));1371}1372}13731374st_insert(RCLASS(klass)->iv_tbl,id,val);1375}
(variable.c)
Youcanthistimeagainignorethewarningchecks(rb_raise(),rb_error_frozen()andrb_warn()).Here’swhat’sleft:
▼mod_av_set()(onlytheimportantpart)
if(!RCLASS(klass)->iv_tbl){RCLASS(klass)->iv_tbl=st_init_numtable();}st_insert(RCLASS(klass)->iv_tbl,id,val);
We’renowsureconstantsalsoresideintheinstancetable.Itmeansintheiv_tblofstructRClass,thefollowingaremixedtogether:
1. theclass’sowninstancevariables2. classvariables3. constants
ReadingWenowknowhowtheconstantsarestored.We’llnowcheckhowtheyreallywork.
rb_const_get()
We’llnowlookatrb_const_get(),thefunctiontoreadaconstant.Thisfunctionreturnstheconstantreferredtobyidfromtheclassklass.
▼rb_const_get()
1156VALUE1157rb_const_get(klass,id)1158VALUEklass;
1159IDid;1160{1161VALUEvalue,tmp;1162intmod_retry=0;11631164tmp=klass;1165retry:1166while(tmp){1167if(RCLASS(tmp)->iv_tbl&&st_lookup(RCLASS(tmp)->iv_tbl,id,&value)){1168returnvalue;1169}1170if(tmp==rb_cObject&&top_const_get(id,&value))returnvalue;1171tmp=RCLASS(tmp)->super;1172}1173if(!mod_retry&&BUILTIN_TYPE(klass)==T_MODULE){1174mod_retry=1;1175tmp=rb_cObject;1176gotoretry;1177}11781179/*Uninitializedconstant*/1180if(klass&&klass!=rb_cObject){1181rb_name_error(id,"uninitializedconstant%sat%s",1182rb_id2name(id),1183RSTRING(rb_class_path(klass))->ptr);1184}1185else{/*global_uninitialized*/1186rb_name_error(id,"uninitializedconstant%s",rb_id2name(id));1187}1188returnQnil;/*notreached*/1189}
(variable.c)
There’salotofcodeintheway.First,weshouldatleastremovetherb_name_error()inthesecondhalf.Inthemiddle,what’saroundmod_entryseemstobeaspecialhandlingformodules.Let’salsoremovethatforthetimebeing.Thefunctiongetsreducedtothis:
▼rb_const_get(simplified)
VALUErb_const_get(klass,id)VALUEklass;IDid;{VALUEvalue,tmp;
tmp=klass;while(tmp){if(RCLASS(tmp)->iv_tbl&&st_lookup(RCLASS(tmp)->iv_tbl,id,&value)){returnvalue;}if(tmp==rb_cObject&&top_const_get(id,&value))returnvalue;tmp=RCLASS(tmp)->super;}}
Nowitshouldbeprettyeasytounderstand.Thefunctionsearchesfortheconstantiniv_tblwhileclimbingklass’ssuperclasschain.Thatmeans:
classAConst="ok"endclassB<Ap(Const)#canbeaccessedend
Theonlyproblemremainingistop_const_get().Thisfunctionisonlycalledforrb_cObjectsotopmustmean“top-level”.Ifyoudon’tremember,atthetop-level,theclassisObject.Thismeansthesameas“intheclassstatementdefiningC,theclassbecomesC”,meaningthat“thetop-level’sclassisObject”.
#theclassofthetop-levelisObjectclassA#theclassisAclassB#theclassisBendend
Sotop_const_get()probablydoessomethingspecifictothetoplevel.
top_const_get()
Let’slookatthistop_const_getfunction.Itlooksuptheidconstantwritesthevalueinklasspandreturns.
▼top_const_get()
1102staticint1103top_const_get(id,klassp)1104IDid;1105VALUE*klassp;1106{1107/*pre-definedclass*/1108if(st_lookup(rb_class_tbl,id,klassp))returnQtrue;11091110/*autoload*/1111if(autoload_tbl&&st_lookup(autoload_tbl,id,0)){1112rb_autoload_load(id);1113*klassp=rb_const_get(rb_cObject,id);1114returnQtrue;1115}1116returnQfalse;1117}
(variable.c)
rb_class_tblwasalreadymentionedinchapter4“Classesandmodules”.It’sthetableforstoringtheclassesdefinedatthetop-level.Built-inclasseslikeStringorArrayhaveforexampleanentryinit.That’swhyweshouldnotforgettosearchinthistablewhenlookingfortop-levelconstants.
Thenextblockisrelatedtoautoloading.Itisdesignedtobeabletoregisteralibrarythatisloadedautomaticallywhenaccessingaparticulartop-levelconstantforthefirsttime.Thiscanbeusedlikethis:
autoload(:VeryBigClass,"verybigclass")#VeryBigClassisdefinedinit
Afterthis,whenVeryBigClassisaccessedforthefirsttime,theverybigclasslibraryisloaded(withrequire).AslongasVeryBigClassisdefinedinthelibrary,executioncancontinuesmoothly.It’sanefficientapproach,whenalibraryistoobigandalotoftimeisspentonloading.
Thisautoloadisprocessedbyrb_autoload_xxxx().Wewon’tdiscussautoloadfurtherinthischapterbecausetherewillprobablybeabigchangeinhowitworkssoon.
(translator’snote:Thewayautoloadworksdidchangein1.8:autoloadedconstantsdonotneedtobedefinedattop-levelanymore).
Otherclasses?
Butwheredidthecodeforlookingupconstantsinotherclassesendup?Afterall,constantsarefirstlookedupintheoutsideclasses,theninthesuperclasses.
Infact,wedonotyethaveenoughknowledgetolookatthat.Theoutsideclasseschangedependingonthelocationintheprogram.Inotherwordsitdependsoftheprogramcontext.Soweneedfirsttounderstandhowtheinternalstateoftheevaluatorishandled.Specifically,thissearchinotherclassesisdoneintheev_const_get()functionofeval.c.We’lllookatitandfinishwiththeconstantsinthethirdpartofthebook.
Globalvariables
GeneralremarksGlobalvariablescanbeaccessedfromanywhere.Orputtheotherwayaround,thereisnoneedtorestrictaccesstothem.Becausetheyarenotattachedtoanycontext,thetableonlyhastobeatoneplace,andthere’snoneedtodoanycheck.Thereforeimplementationisverysimple.
Butthereisstillquitealotofcode.ThereasonforthisisthatglobalvariablesofRubyareequippedwithsomegimmickswhichmakeithardtoregardthemasmerevariables.Functionslikethefollowingareonlyavailableforglobalvariables:
youcan“hook”accessofglobalvariablesyoucanaliasthemwithalias
Let’sexplainthissimply.
Aliasesofvariablesalias$newname$oldname
Afterthis,youcanuse$newnameinsteadof$oldname.aliasforvariablesismainlyacounter-measurefor“symbolvariables”.“symbolvariables”arevariablesinheritedfromPerllike$=or$0.$=decidesifduringstringcomparisonupperandlowercaselettersshouldbedifferentiated.$0showsthenameofthemainRubyprogram.Therearesomeothersymbolvariablesbutanywayastheirnameisonlyonecharacterlong,theyaredifficulttorememberforpeoplewhodon’tknowPerl.So,aliaseswerecreatedtomakethemalittleeasiertounderstand.
Thatsaid,currentlysymbolvariablesarenotrecommended,andaremovedonebyoneinsingletonmethodsofsuitablemodules.Thecurrentschoolofthoughtisthat$=andotherswillbeabolishedin2.0.
HooksYoucan“hook”readandwriteofglobalvariables.
AlthoughhookscanbealsobesetattheRubylevel,Ithinkthe
purposeofitseemsrathertopreparethespecialvariablesforsystemuselike$KCODEatClevel.$KCODEisthevariablecontainingtheencodingtheinterpretercurrentlyusestohandlestrings.Essentiallyonlyspecialstringslike"EUC"or"UTF8"canbeassignedtoit,butthisistoobothersomesoitisdesignedsothat"e"or"u"canalsobeused.
p($KCODE)#"NONE"(default)$KCODE="e"p($KCODE)#"EUC"$KCODE="u"p($KCODE)#"UTF8"
Knowingthatyoucanhookassignmentofglobalvariables,youshouldunderstandeasilyhowthiscanbedone.Bytheway,$KCODE’sKcomesfrom“kanji”(thenameofChinesecharactersinJapanese).
Youmightsaythatevenwithaliasorhooks,globalvariablesjustaren’tusedmuch,soit’sfunctionalitythatdoesn’treallymater.It’sadequatenottotalkmuchaboutunusedfunctions,andI’dliketousemorepagesfortheanalysisoftheparserandevaluator.That’swhyI’llproceedwiththeexplanationbelowwhosedegreeofhalf-heartedis85%.
DatastructureIsaidthatthepointwhenlookingathowvariablesworkisthewaytheyarestored.First,I’dlikeyoutofirmlygraspthestructureused
byglobalvariables.
▼Datastructureforglobalvariables
21staticst_table*rb_global_tbl;
334structglobal_entry{335structglobal_variable*var;336IDid;337};
324structglobal_variable{325intcounter;/*referencecounter*/326void*data;/*valueofthevariable*/327VALUE(*getter)();/*functiontogetthevariable*/328void(*setter)();/*functiontosetthevariable*/329void(*marker)();/*functiontomarkthevariable*/330intblock_trace;331structtrace_var*trace;332};
(variable.c)
rb_global_tblisthemaintable.Allglobalvariablesarestoredinthistable.Thekeysofthistableareofcoursevariablenames(ID).Avalueisexpressedbyastructglobal_entryandastructglobal_variable(figure1).
Figure1:Globalvariablestableatexecutiontime
Thestructurerepresentingthevariablesissplitintwotobeabletocreatealiases.Whenanaliasisestablished,twoglobal_entryspointtothesamestructglobal_variable.
It’satthistimethatthereferencecounter(thecountermemberofstructglobal_variable)isnecessary.Iexplainedthegeneralideaofareferencecounterintheprevioussection“Garbagecollection”.Reviewingitbriefly,whenanewreferencetothestructureismade,thecounterinincrementedby1.Whenthereferenceisnotusedanymore,thecounterisdecreasedby1.Whenthecounterreaches0,thestructureisnolongerusefulsofree()canbecalled.
WhenhooksaresetattheRubylevel,alistofstructtrace_varsisstoredinthetracememberofstructglobal_variable,butIwon’ttalkaboutit,andomitstructtrace_var.
Reading
Youcanhaveageneralunderstandingofglobalvariablesjustbylookingathowtheyareread.Thefunctionsforreadingthemarerb_gv_get()andrb_gvar_get().
▼rb_gv_get()rb_gvar_get()
716VALUE717rb_gv_get(name)718constchar*name;719{720structglobal_entry*entry;721722entry=rb_global_entry(global_id(name));723returnrb_gvar_get(entry);724}
649VALUE650rb_gvar_get(entry)651structglobal_entry*entry;652{653structglobal_variable*var=entry->var;654return(*var->getter)(entry->id,var->data,var);655}
(variable.c)
Asubstantialpartofthecontentseemstoturnaroundtherb_global_entry()function,butthatdoesnotpreventusunderstandingwhat’sgoingon.global_idisafunctionthatconvertsachar*toIDandchecksifit’stheIDofaglobalvariable.(*var->getter)(...)isofcourseafunctioncallusingthefunctionpointervar->getter.Ifpisafunctionpointer,(*p)(arg)callsthefunction.
Butthemainpartisstillrb_global_entry().
▼rb_global_entry()
351structglobal_entry*352rb_global_entry(id)353IDid;354{355structglobal_entry*entry;356357if(!st_lookup(rb_global_tbl,id,&entry)){358structglobal_variable*var;359entry=ALLOC(structglobal_entry);360st_add_direct(rb_global_tbl,id,entry);361var=ALLOC(structglobal_variable);362entry->id=id;363entry->var=var;364var->counter=1;365var->data=0;366var->getter=undef_getter;367var->setter=undef_setter;368var->marker=undef_marker;369370var->block_trace=0;371var->trace=0;372}373returnentry;374}
(variable.c)
Themaintreatmentisonlydonebythest_lookup()atthebeginning.What’sdoneafterwardsisjustcreating(andstoring)anewentry.As,whenaccessinganonexistingglobalvariable,anentryisautomaticallycreated,rb_global_entry()willneverreturnNULL.
Thiswasmainlydoneforspeed.Whentheparserfindsaglobalvariable,itgetsthecorrespondingstructglobal_entry.Whenreadingthevalueofthevariable,thevalueisjustobtainedfromtheentry(usingrb_gv_get()).
Let’snowcontinuealittlewiththecodethatfollows.var->getterandothersaresettoundef_xxxx.undefprobablymeansthattheyarethesetter/getter/markerforaglobalvariablewhosestateisundefined.
undef_getter()justshowsawarningandreturnsnil,asevenundefinedglobalvariablescanberead.undef_setter()isalittlebitinterestingsolet’slookatit.
▼undef_setter()
385staticvoid386undef_setter(val,id,data,var)387VALUEval;388IDid;389void*data;390structglobal_variable*var;391{392var->getter=val_getter;393var->setter=val_setter;394var->marker=val_marker;395396var->data=(void*)val;397}
(variable.c)
val_getter()takesthevaluefromentry->dataandreturnsit.
val_getter()justputsavalueinentry->data.Settinghandlersthiswayallowsusnottoneedspecialhandlingforundefinedvariables(figure2).Skillfullydone,isn’tit?
Figure2:Settingandconsultationofglobalvariables
TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera
CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License
RubyHackingGuide
TranslatedbyCliffordEscobarCAOILE&ocha-
Chapter7:Security
FundamentalsIsaysecuritybutIdon’tmeanpasswordsorencryption.TheRubysecurityfeatureisusedforhandlinguntrustedobjectsinaenvironmentlikeCGIprogramming.
Forexample,whenyouwanttoconvertastringrepresentinganumberintoainteger,youcanusetheevalmethod.However.evalisamethodthat“runsastringasaRubyprogram.”Ifyouevalastringfromaunknownpersonfromthenetwork,itisverydangerous.Howeverfortheprogrammertofullydifferentiatebetweensafeandunsafethingsisverytiresomeandcumbersome.Therefore,itisforcertainthatamistakewillbemade.So,letusmakeitpartofthelanguage,wasreasoningforthisfeature.
Sothen,howRubyprotectusfromthatsortofdanger?Causesofdangerousoperations,forexample,openingunintendedfiles,areroughlydividedintotwogroups:
DangerousdataDangerouscode
Fortheformer,thecodethathandlesthesevaluesiscreatedbytheprogrammersthemselves,sothereforeitis(relatively)safe.For
thelatter,theprogramcodeabsolutelycannotbetrusted.
Becausethesolutionisvastlydifferentbetweenthetwocauses,itisimportanttodifferentiatethembylevel.Thisarecalledsecuritylevels.TheRubysecuritylevelisrepresentedbythe$SAFEglobalvariable.Thevaluerangesfromminimumvalue0tomaximumvalue4.Whenthevariableisassigned,thelevelincreases.Oncethelevelisraiseditcanneverbelowered.Andforeachlevel,theoperationsarelimited.
Iwillnotexplainlevel1or3.Level0isthenormalprogramenvironmentandthesecuritysystemisnotrunning.Level2handlesdangerousvalues.Level4handlesdangerouscode.Wecanskip0andmoveontoexplainindetaillevels2and4.
((errata:Level1handlesdangerousvalues.“Level2hasnousecurrently”isright.))
Level1Thislevelisfordangerousdata,forexample,innormalCGIapplications,etc.
Aper-object“taintedmark”servesasthebasisfortheLevel1implementation.Allobjectsreadinexternallyaremarkedtainted,andanyattempttoevalorFile.openwithataintedobjectwillcauseanexceptiontoberaisedandtheattemptwillbestopped.
Thistaintedmarkis“infectious”.Forexample,whentakingapart
ofataintedstring,thatpartisalsotainted.
Level4Thislevelisfordangerousprograms,forexample,runningexternal(unknown)programs,etc.
Atlevel1,operationsandthedataitusesarechecked,butatlevel4,operationsthemselvesarerestricted.Forexample,exit,fileI/O,threadmanipulation,redefiningmethods,etc.Ofcourse,thetaintedmarkinformationisused,butbasicallytheoperationsarethecriteria.
UnitofSecurity$SAFElookslikeaglobalvariablebutisinactualityathreadlocalvariable.Inotherwords,Ruby’ssecuritysystemworksonunitsofthread.InJavaand.NET,rightscanbesetpercomponent(object),butRubydoesnotimplementthat.TheassumedmaintargetwasprobablyCGI.
Therefore,ifonewantstoraisethesecuritylevelofonepartoftheprogram,thenitshouldbemadeintoadifferentthreadandhaveitssecuritylevelraised.Ihaven’tyetexplainedhowtocreateathread,butIwillshowanexamplehere:
#Raisethesecuritylevelinadifferentthreadp($SAFE)#0isthedefaultThread.fork{#Startadifferentthread$SAFE=4#Raisethelevel
eval(str)#Runthedangerousprogram}p($SAFE)#Outsideoftheblock,thelevelisstill0
Reliabilityof$SAFEEvenwithimplementingthespreadingoftaintedmarks,orrestrictingoperations,ultimatelyitisstillhandledmanually.Inotherwords,internallibrariesandexternallibrariesmustbecompletelycompatibleandiftheydon’t,thenthepartwaythe“tainted”operationswillnotspreadandthesecuritywillbelost.Andactuallythiskindofholeisoftenreported.Forthisreason,thiswriterdoesnotwhollytrustit.
Thatisnottosay,ofcourse,thatallRubyprogramsaredangerous.Evenat$SAFE=0itispossibletowriteasecureprogram,andevenat$SAFE=4itispossibletowriteaprogramthatfitsyourwhim.However,onecannotputtoomuchconfidenceon$SAFE(yet).
Inthefirstplace,functionalityandsecuritydonotgotogether.Itiscommonsensethataddingnewfeaturescanmakeholeseasiertoopen.Thereforeitisprudenttothinkthatrubycanprobablybedangerous.
ImplementationFromnowon,we’llstarttolookintoitsimplementation.Inordertowhollygraspthesecuritysystemofruby,wehavetolookat“whereisbeingchecked”ratherthanitsmechanism.However,this
timewedon’thaveenoughpagestodoit,andjustlistingthemupisnotinteresting.Therefore,inthischapter,I’llonlydescribeaboutthemechanismusedforsecuritychecks.TheAPIstocheckaremainlythesebelowtwo:
rb_secure(n):Ifmorethanorequaltoleveln,itwouldraiseSecurityError.SafeStringValue():Ifmorethanorequaltolevel1andastringistainted,thenitwouldraiseanexception.
Wewon’treadSafeStringValue()here.
TaintedMarkThetaintmarkis,tobeconcrete,theFL_TAINTflag,whichissettobasic->flags,andwhatisusedtoinfectitistheOBJ_INFECT()macro.Hereisitsusage.
OBJ_TAINT(obj)/*setFL_TAINTtoobj*/OBJ_TAINTED(obj)/*checkifFL_TAINTissettoobj*/OBJ_INFECT(dest,src)/*infectFL_TAINTfromsrctodest*/
SinceOBJ_TAINT()andOBJ_TAINTED()canbeassumednotimportant,let’sbrieflylookoveronlyOBJ_INFECT().
▼OBJ_INFECT
441#defineOBJ_INFECT(x,s)do{\if(FL_ABLE(x)&&FL_ABLE(s))\RBASIC(x)->flags|=RBASIC(s)->flags&FL_TAINT;\
}while(0)
(ruby.h)
FL_ABLE()checksiftheargumentVALUEisapointerornot.Ifthebothobjectsarepointers(itmeanseachofthemhasitsflagsmember),itwouldpropagatetheflag.
$SAFE▼ruby_safe_level
124intruby_safe_level=0;
7401staticvoid7402safe_setter(val)7403VALUEval;7404{7405intlevel=NUM2INT(val);74067407if(level<ruby_safe_level){7408rb_raise(rb_eSecurityError,"triedtodowngradesafelevelfrom%dto%d",7409ruby_safe_level,level);7410}7411ruby_safe_level=level;7412curr_thread->safe=level;7413}
(eval.c)
Thesubstanceof$SAFEisruby_safe_levelineval.c.AsIpreviouslywrote,$SAFEislocaltoeachthread,Itneedstobewrittenineval.cwheretheimplementationofthreadsislocated.Inotherwords,itisineval.conlybecauseoftherestrictionsofC,butitcan
essentiallybelocatedinanotherplace.
safe_setter()isthesetterofthe$SAFEglobalvariable.Itmeans,becausethisfunctionistheonlywaytoaccessitfromRubylevel,thesecuritylevelcannotbelowered.
However,asyoucansee,fromClevel,becausestaticisnotattachedtoruby_safe_level,youcanignoretheinterfaceandmodifythesecuritylevel.
rb_secure()
▼rb_secure()
136void137rb_secure(level)138intlevel;139{140if(level<=ruby_safe_level){141rb_raise(rb_eSecurityError,"Insecureoperation`%s'atlevel%d",142rb_id2name(ruby_frame->last_func),ruby_safe_level);143}144}
(eval.c)
Ifthecurrentsafelevelismorethanorequaltolevel,thiswouldraiseSecurityError.It’ssimple.
TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera
CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License
RubyHackingGuide
Chapter8:RubyLanguageDetails
I’lltalkaboutthedetailsofRuby’ssyntaxandevaluation,whichhaven’tbeencoveredyet.Ididn’tintendacompleteexposition,soIleftouteverythingwhichdoesn’tcomeupinthisbook.That’swhyyouwon’tbeabletowriteRubyprogramsjustbyreadingthis.Acompleteexpositioncanbefoundinthe\footnote{Rubyreferencemanual:archives/ruby-refm.tar.gzintheattachedCD-ROM}
ReaderswhoknowRubycanskipoverthischapter.
Literals
TheexpressivenessofRuby’sliteralsisextremelyhigh.Inmyopinion,whatmakesRubyascriptlanguageisfirstlytheexistenceofthetoplevel,secondlyit’stheexpressivenessofitsliterals.Thirdlyitmightbetherichnessofitsstandardlibrary.
Asingleliteralalreadyhasenormouspower,butevenmorewhenmultipleliteralsarecombined.EspeciallytheabilityofcreatingcomplexliteralsthathashandarrayliteralsarecombinedisthebiggestadvantageofRuby’sliteral.Onecanwrite,forinstance,ahashofarraysofregularexpressionsbyconstructingstraightforwardly.
Whatkindofexpressionsarevalid?Let’slookatthemonebyone.
StringsStringsandregularexpressionscan’tbemissinginascriptinglanguage.TheexpressivenessofRuby’sstringisveryvariousevenmorethantheotherRuby’sliterals.
SingleQuotedStrings'string'#「string」'\\begin{document}'#「\begin{document}」'\n'#「\n」backslashandann,notanewline'\1'#「\1」backslashand1'\''#「'」
Thisisthesimplestform.InC,whatenclosedinsinglequotesbecomesacharacter,butinRuby,itbecomesastring.Let’scallthisa'-string.Thebackslashescapeisineffectonlyfor\itselfand'.Ifoneputsabackslashinfrontofanothercharacterthebackslashremainsasinthefourthexample.
AndRuby’sstringsaren’tdividedbynewlinecharacters.Ifwewriteastringoverseverallinesthenewlinesarecontainedinthestring.
'multilinestring'
Andifthe-Koptionisgiventotherubycommand,multibytestringswillbeaccepted.AtpresentthethreeencodingsEUC-JP(-Ke),Shift
JIS(-Ks),andUTF8(-Ku)canbespecified.
'「漢字が通る」と「マルチバイト⽂字が通る」はちょっと違う'#'There'salittledifferencebetween"Kanjiareaccepted"and"Multibytecharactersareaccepted".'
DoubleQuotedStrings"string"#「string」"\n"#newline"\x0f"#abytegiveninhexadecimalform"page#{n}.html"#embeddingacommand
Withdoublequoteswecanusecommandexpansionandbackslashnotation.ThebackslashnotationissomethingclassicalthatisalsosupportedinC,forinstance,\nisanewline,\bisabackspace.InRuby,Ctrl-CandESCcanalsobeexpressed,that’sconvenient.However,merelylistingthewholenotationisnotfun,regardingitsimplementation,itjustmeansalargenumberofcasestobehandledandthere’snothingespeciallyinteresting.Therefore,theyareentirelyleftouthere.
Ontheotherhand,expressionexpansionisevenmorefantastic.WecanwriteanarbitraryRubyexpressioninside#{}anditwillbeevaluatedatruntimeandembeddedintothestring.Therearenolimitationslikeonlyonevariableoronlyonemethod.Gettingthisfar,itisnotamereliteralanymorebuttheentirethingcanbeconsideredasanexpressiontoexpressastring.
"embedded#{lvar}expression""embedded#{@ivar}expression"
"embedded#{1+1}expression""embedded#{method_call(arg)}expression""embedded#{"stringinstring"}expression"
Stringswith%%q(string)#sameas'string'%Q(string)#sameas"string"%(string)#sameas%Q(string)or"string"
Ifalotofseparatorcharactersappearinastring,escapingallofthembecomesaburden.Inthatcasetheseparatorcharacterscanbechangedbyusing%.Inthefollowingexample,thesamestringiswrittenasa"-stringand%-string.
"<ahref=\"http://i.loveruby.net#{path}\">"%Q(<ahref="http://i.loveruby.net#{path}">)
Thebothexpressionshasthesamelength,butthe%-oneisalotnicertolookat.Whenwehavemorecharacterstoescapeinit,%-stringwouldalsohaveadvantageinlength.
Herewehaveusedparenthesesasdelimiters,butsomethingelseisfine,too.Likebracketsorbracesor#.Almosteverysymbolisfine,even%.
%q#thisisstring#%q[thisisstring]%q%thisisstring%
HereDocuments
Heredocumentisasyntaxwhichcanexpressstringsspanningmultiplelines.Anormalstringstartsrightafterthedelimiter"andeverythinguntiltheending"wouldbethecontent.Whenusingheredocument,thelinesbetweenthelinewhichcontainsthestarting<<EOSandthelinewhichcontainstheendingEOSwouldbethecontent.
"thecharactersbetweenthestartingsymbolandtheendingsymbolwillbecomeastring."
<<EOSAlllinesbetweenthestartingandtheendinglineareinthisheredocumentEOS
HereweusedEOSasidentifierbutanywordisfine.Preciselyspeaking,allthecharactermatching[a-zA-Z_0-9]andmulti-bytecharacterscanbeused.
Thecharacteristicofheredocumentisthatthedelimitersare“thelinescontainingthestartingidentifierortheendingidentifier”.Thelinewhichcontainsthestartsymbolisthestartingdelimiter.Therefore,thepositionofthestartidentifierinthelineisnotimportant.Takingadvantageofthis,itdoesn’tmatterthat,forinstance,itiswritteninthemiddleofanexpression:
printf(<<EOS,count_n(str))count=%dEOS
Inthiscasethestring"count=%d\n"goesintheplaceof<<EOS.Soit’sthesameasthefollowing.
printf("count=%d\n",count_n(str))
Thepositionofthestartingidentifierisreallynotrestricted,butonthecontrary,therearestrictrulesfortheendingsymbol:Itmustbeatthebeginningofthelineandtheremustnotbeanotherletterinthatline.Howeverifwewritethestartsymbolwithaminuslikethis<<-EOSwecanindentthelinewiththeendsymbol.
<<-EOSItwouldbeconvenientifonecouldindentthecontentofaheredocument.Butthat'snotpossible.Ifyouwantthat,writingamethodtodeleteindentsisusuallyawaytogo.Butbewareoftabs.EOS
Furthermore,thestartsymbolcanbeenclosedinsingleordoublequotes.Thenthepropertiesofthewholeheredocumentchange.Whenwechange<<EOSto<<"EOS"wecanuseembeddedexpressionsandbackslashnotation.
<<"EOS"Onedayis#{24*60*60}seconds.Incredible.EOS
But<<'EOS'isnotthesameasasinglequotedstring.Itstartsthecompleteliteralmode.Everythingevenbackslashesgointothestringastheyaretyped.Thisisusefulforastringwhichcontains
manybackslashes.
InPart2,I’llexplainhowtoparseaheredocument.ButI’dlikeyoutotrytoguessitbefore.
CharactersRubystringsarebytesequences,therearenocharacterobjects.InsteadtherearethefollowingexpressionswhichreturntheintegerswhichcorrespondacertaincharacterinASCIIcode.
?a#theintegerwhichcorrespondsto"a"?.#theintegerwhichcorrespondsto"."?\n#LF?\C-a#Ctrl-a
RegularExpressions/regexp//^Content-Length:/i/正規表現//\/\*.*?\*\//m#AnexpressionwhichmatchesCcomments/reg#{1+1}exp/#thesameas/reg2exp/
Whatiscontainedbetweenslashesisaregularexpression.Regularexpressionsarealanguagetodesignatestringpatterns.Forexample
/abc/
Thisregularexpressionmatchesastringwherethere’sana
followedbyabfollowedbyac.Itmatches“abc”or“fffffffabc”or“abcxxxxx”.
Onecandesignatemorespecialpatterns.
/^From:/
Thismatchesastringwherethere’saFromfollowedbya:atthebeginningofaline.Thereareseveralmoreexpressionsofthiskind,suchthatonecancreatequitecomplexpatterns.
Theusesareinfinite:Changingthematchedparttoanotherstring,deletingthematchedpart,determiningifthere’sonematchandsoon…
Amoreconcreteusecasewouldbe,forinstance,extractingtheFrom:headerfromamail,orchangingthe\ntoan\r,orcheckingifastringlookslikeamailaddress.
Sincetheregularexpressionitselfisanindependentlanguage,ithasitsownparserandevaluatorwhicharedifferentfromruby.Theycanbefoundinregex.c.Hence,it’senoughforrubytobeabletocutouttheregularexpressionpartfromaRubyprogramandfeedit.Asaconsequence,theyaretreatedalmostthesameasstringsfromthegrammaticalpointofview.Almostallofthefeatureswhichstringshavelikeescapes,backslashnotationsandembeddedexpressionscanbeusedinthesamewayinregularexpressions.
However,wecansaytheyaretreatedasthesameasstringsonlywhenweareintheviewpointof“Ruby’ssyntax”.Asmentionedbefore,sinceregularexpressionitselfisalanguage,naturallywehavetofollowitslanguageconstraints.Todescriberegularexpressionindetail,it’ssolargethatonemorecanbewritten,soI’dlikeyoutoreadanotherbookforthissubject.Irecommend“MasteringRegularExpression”byJeffreyE.F.Friedl.
RegularExpressionswith%Alsoaswithstrings,regularexpressionsalsohaveasyntaxforchangingdelimiters.Inthiscaseitis%r.Tounderstandthis,lookingatsomeexamplesareenoughtounderstand.
%r(regexp)%r[/\*.*?\*/]#matchesaCcomment%r("(?:[^"\\]+|\\.)*")#matchesastringinC%r{reg#{1+1}exp}#embeddingaRubyexpression
ArraysAcomma-separatedlistenclosedinbrackets[]isanarrayliteral.
[1,2,3]['This','is','an','array','of','string']
[/regexp/,{'hash'=>3},4,'string',?\C-a]
lvar=$gvar=@ivar=@@cvar=nil[lvar,$gvar,@ivar,@@cvar][Object.new(),Object.new(),Object.new()]
Ruby’sarray(Array)isalistofarbitraryobjects.Fromasyntacticalstandpoint,it’scharacteristicisthatarbitraryexpressionscanbeelements.Asmentionedearlier,anarrayofhashesofregularexpressionscaneasilybemade.Notjustliteralsbutalsoexpressionswhichvariablesormethodcallscombinedtogethercanalsobewrittenstraightforwardly.
Notethatthisis“anexpressionwhichgeneratesanarrayobject”aswiththeotherliterals.
i=0whilei<5p([1,2,3].id)#Eachtimeanotherobjectidisshown.i+=1end
WordArraysWhenwritingscriptsoneusesarraysofstringsalot,hencethereisaspecialnotationonlyforarraysofstrings.Thatis%w.Withanexampleit’simmediatelyobvious.
%w(alphabetagammadelta)#['alpha','beta','gamma','delta']%w(⽉⽕⽔⽊⾦⼟⽇)%w(JanFebMarAprMayJunJulAugSepOctNovDec)
There’salso%Wwhereexpressionscanbeembedded.It’safeatureimplementedfairlyrecently.
n=5
%w(list0list#{n})#['list0','list#{n}']%W(list0list#{n})#['list0','list5']
Theauthorhasn’tcomeupwithagooduseof%Wyet.
HashesHashtablesaredatastructurewhichstoreaone-to-onerelationbetweenarbitraryobjects.Bywritingasfollows,theywillbeexpressionstogeneratetables.
{'key'=>'value','key2'=>'value2'}{3=>0,'string'=>5,['array']=>9}{Object.new()=>3,Object.new()=>'string'}
#Ofcoursewecanputitinseverallines.{0=>0,1=>3,2=>6}
Weexplainedhashesindetailinthethirdchapter“NamesandNametables”.Theyarefastlookuptableswhichallocatememoryslotsdependingonthehashvalues.InRubygrammar,bothkeysandvaluescanbearbitraryexpressions.
Furthermore,whenusedasanargumentofamethodcall,the{...}canbeomittedunderacertaincondition.
some_method(arg,key=>value,key2=>value2)#some_method(arg,{key=>value,key2=>value2})#sameasabove
Withthiswecanimitatenamed(keyword)arguments.
button.set_geometry('x'=>80,'y'=>'240')
Ofcourseinthiscaseset_geometrymustacceptahashasinput.Thoughrealkeywordargumentswillbetransformedintoparametervariables,it’snotthecaseforthisbecausethisisjusta“imitation”.
RangesRangeliteralsareoddballswhichdon’tappearinmostotherlanguages.HerearesomeexpressionswhichgenerateRangeobjects.
0..5#from0to5containing50...5#from0to5notcontaining51+2..9+0#from3to9containing9'a'..'z'#stringsfrom'a'to'z'containing'z'
Iftherearetwodotsthelastelementisincluded.Iftherearethreedotsitisnotincluded.Notonlyintegersbutalsofloatsandstringscanbemadeintoranges,evenarangebetweenarbitraryobjectscanbecreatedifyou’dattempt.However,thisisaspecificationofRangeclass,whichistheclassofrangeobjects,(itmeansalibrary),thisisnotamatterofgrammar.Fromtheparser’sstandpoint,itjustenablestoconcatenatearbitraryexpressionswith...Ifarangecannotbegeneratedwiththeobjectsastheevaluatedresults,itwouldbearuntimeerror.
Bytheway,becausetheprecedenceof..and...isquitelow,
sometimesitisinterpretedinasurprisingway.
1..5.to_a()#1..(5.to_a())
IthinkmypersonalityisrelativelybentforRubygrammar,butsomehowIdon’tlikeonlythisspecification.
SymbolsInPart1,wetalkedaboutsymbolsatlength.It’ssomethingcorrespondsone-to-onetoanarbitrarystring.InRubysymbolsareexpressedwitha:infront.
:identifier:abcde
Theseexamplesareprettynormal.Actually,besidesthem,allvariablenamesandmethodnamescanbecomesymbolswitha:infront.Likethis:
:$gvar:@ivar:@@cvar:CONST
Moreover,thoughwehaven’ttalkedthisyet,[]orattr=canbeusedasmethodnames,sonaturallytheycanalsobeusedassymbols.
:[]:attr=
Whenoneusesthesesymbolsasvaluesinanarray,it’lllookquitecomplicated.
NumericalValuesThisistheleastinteresting.OnepossiblethingIcanintroducehereisthat,whenwritingamillion,
1_000_000
aswrittenabove,wecanuseunderscoredelimitersinthemiddle.Buteventhisisn’tparticularlyinteresting.Fromhereoninthisbook,we’llcompletelyforgetaboutnumericalvalues.
Methods
Let’stalkaboutthedefinitionandcallingofmethods.
DefinitionandCallsdefsome_method(arg)....end
classCdefsome_method(arg)....end
end
Methodsaredefinedwithdef.Iftheyaredefinedattopleveltheybecomefunctionstylemethods,insideaclasstheybecomemethodsofthisclass.Tocallamethodwhichwasdefinedinaclass,oneusuallyhastocreateaninstancewithnewasshownbelow.
C.new().some_method(0)
TheReturnValueofMethodsThereturnvalueofamethodis,ifareturnisexecutedinthemiddle,itsvalue.Otherwise,it’sthevalueofthestatementwhichwasexecutedlast.
defone()#1isreturnedreturn1999end
deftwo()#2isreturned9992end
defthree()#3isreturnediftruethen3else999endend
Ifthemethodbodyisempty,itwouldautomaticallybenil,andanexpressionwithoutavaluecannotputattheend.Henceeverymethodhasareturnvalue.
OptionalArgumentsOptionalargumentscanalsobedefined.Ifthenumberofargumentsdoesn’tsuffice,theparametersareautomaticallyassignedtodefaultvalues.
defsome_method(arg=9)#defaultvalueis9pargend
some_method(0)#0isshown.some_method()#Thedefaultvalue9isshown.
Therecanalsobeseveraloptionalarguments.Butinthatcasetheymustallcomeattheendoftheargumentlist.Ifelementsinthemiddleofthelistwereoptional,howthecorrespondencesoftheargumentswouldbeveryunclear.
defright_decl(arg1,arg2,darg1=nil,darg2=nil)....end
#Thisisnotpossibledefwrong_decl(arg,default=nil,arg2)#Amiddleargumentcannotbeoptional....end
Omittingargumentparentheses
Infact,theparenthesesofamethodcallcanbeomitted.
puts'Hello,World!'#puts("Hello,World")obj=Object.new#obj=Object.new()
InPythonwecangetthemethodobjectbyleavingoutparentheses,butthereisnosuchthinginRuby.
Ifyou’dliketo,youcanomitmoreparentheses.
puts(File.basenamefname)#puts(File.basename(fname))sameastheabove
Ifwelikewecanevenleaveoutmore
putsFile.basenamefname#puts(File.basename(fname))sameastheabove
However,recentlythiskindof“nestedomissions”becameacauseofwarnings.It’slikelythatthiswillnotpassanymoreinRuby2.0.
Actuallyeventheparenthesesoftheparametersdefinitioncanalsobeomitted.
defsome_methodparam1,param2,param3end
defother_method#withoutarguments...weseethisalotend
Parenthesesareoftenleftoutinmethodcalls,butleavingoutparenthesesinthedefinitionisnotverypopular.Howeverifthere
arenoarguments,theparenthesesarefrequentlyomitted.
ArgumentsandListsBecauseArgumentsformalistofobjects,there’snothingoddifwecandosomethingconverse:extractingalist(anarray)asarguments,asthefollowingexample.
defdelegate(a,b,c)p(a,b,c)end
list=[1,2,3]delegate(*list)#identicaltodelegate(1,2,3)
Inthiswaywecandistributeanarrayintoarguments.Let’scallthisdevicea*argumentnow.Hereweusedalocalvariablefordemonstration,butofcoursethereisnolimitation.Wecanalsodirectlyputaliteraloramethodcallinstead.
m(*[1,2,3])#Wecouldhavewrittentheexpandedforminthefirstplace...m(*mcall())
The*argumentcanbeusedtogetherwithordinaryarguments,butthe*argumentmustcomelast.Otherwise,thecorrespondencestoparametervariablescannotbedeterminedinasingleway.
Inthedefinitionontheotherhandwecanhandletheargumentsinbulkwhenweputa*infrontoftheparametervariable.
defsome_method(*args)
pargsend
some_method()#prints[]some_method(0)#prints[0]some_method(0,1)#prints[0,1]
Thesurplusargumentsaregatheredinanarray.Onlyone*parametercanbedeclared.Itmustalsocomeafterthedefaultarguments.
defsome_method0(arg,*rest)enddefsome_method1(arg,darg=nil,*rest)end
Ifwecombinelistexpansionandbulkreceptiontogether,theargumentsofonemethodcanbepassedasawholetoanothermethod.Thismightbethemostpracticaluseofthe*parameter.
#amethodwhichpassesitsargumentstoother_methoddefdelegate(*args)other_method(*args)end
defother_method(a,b,c)returna+b+cend
delegate(0,1,2)#sameasother_method(0,1,2)delegate(10,20,30)#sameasother_method(10,20,30)
VariousMethodCallExpressionsBeingjustasinglefeatureas‘methodcall’doesnotmeanits
representationisalsosingle.Hereisaboutso-calledsyntacticsugar.InRubythereisatonofit,andtheyarereallyattractiveforapersonwhohasafetishforparsers.Forinstancetheexamplesbelowareallmethodcalls.
1+2#1.+(2)a==b#a.==(b)~/regexp/#/regexp/.~obj.attr=val#obj.attr=(val)obj[i]#obj.[](i)obj[k]=v#obj.[]=(k,v)<code>cvsdiffabstract.rd</code>#Kernel.`('cvsdiffabstract.rd')
It’shardtobelieveuntilyougetusedtoit,butattr=,[]=,\`are(indeed)allmethodnames.Theycanappearasnamesinamethoddefinitionandcanalsobeusedassymbols.
classCdef[](index)enddef+(another)endendp(:attr=)p(:[]=)p(:`)
Astherearepeoplewhodon’tlikesweets,therearealsomanypeoplewhodislikesyntacticsugar.Maybetheyfeelunfairwhenthethingswhichareessentiallythesameappearinfakedlooks.(Why’severyonesoserious?)
Let’sseesomemoredetails.
SymbolAppendicesobj.name?obj.name!
Firstasmallthing.It’sjustappendinga?ora!.CallandDefinitiondonotdiffer,soit’snottoopainful.Thereareconventionforwhattousethesemethodnames,butthereisnoenforcementonlanguagelevel.It’sjustaconventionathumanlevel.ThisisprobablyinfluencedfromLispinwhichagreatvarietyofcharacterscanbeusedinprocedurenames.
BinaryOperators1+2#1.+(2)
BinaryOperatorswillbeconvertedtoamethodcalltotheobjectonthelefthandside.Herethemethod+fromtheobject1iscalled.Aslistedbelowtherearemanyofthem.Therearethegeneraloperators+and-,alsotheequivalenceoperator==andthespaceshipoperator`<=>’asinPerl,allsorts.Theyarelistedinorderoftheirprecedence.
***/%+-<<>>&|^>>=<<=<=>======~
Thesymbols&and|aremethods,butthedoublesymbols&&and||arebuilt-inoperators.RememberhowitisinC.
UnaryOperators+2-1.0~/regexp/
Thesearetheunaryoperators.Thereareonlythreeofthem:+-~.+and-workastheylooklike(bydefault).Theoperator~matchesastringoraregularexpressionwiththevariable$_.Withanintegeritstandsforbitconversion.
Todistinguishtheunary+fromthebinary+themethodnamesfortheunaryoperatorsare+@[email protected]+nor-n.
((errata:+or–astheprefixofanumericliteralisactuallyscannedasapartoftheliteral.Thisisakindofoptimizations.))
AttributeAssignmentobj.attr=val#obj.attr=(val)
Thisisanattributeassignmentfashion.Theabovewillbetranslatedintothemethodcallattr=.Whenusingthistogetherwithmethodcallswhoseparenthesesareomitted,wecanwritecodewhichlookslikeattributeaccess.
classCdefi()@iend#Wecanwritethedefinitioninonelinedefi=(n)@i=nendend
c=C.newc.i=99pc.i#prints99
Howeveritwillturnoutbotharemethodcalls.Theyaresimilartoget/setpropertyinDelphiorslotaccessorsinCLOS.
Besides,wecannotdefineamethodsuchasobj.attr(arg)=,whichcantakeanotherargumentintheattributeassignmentfashion.
IndexNotationobj[i]#obj.[](i)
Theabovewillbetranslatedintoamethodcallfor[].Arrayandhashaccessarealsoimplementedwiththisdevice.
obj[i]=val#obj.[]=(i,val)
Indexassignmentfashion.Thisistranslatedintoacallforamethodnamed[]=.
super
Werelativelyoftenhaveasituationwherewewantaddalittlebittothebehaviourofanalreadyexistingmethodratherthan
replacingit.Hereamechanismtocallamethodofthesuperclasswhenoverwritingamethodisrequired.InRuby,that’ssuper.
classAdeftestputs'inA'endendclassB<Adeftestsuper#invokesA#testendend
Ruby’ssuperdiffersfromtheoneinJava.Thissinglewordmeans“callthemethodwiththesamenameinthesuperclass”.superisareservedword.
Whenusingsuper,becarefulaboutthedifferencebetweensuperwithnoargumentsandsuperwhoseargumentsareomitted.Thesuperwhoseargumentsareomittedpassesallthegivenparametervariables.
classAdeftest(*args)pargsendend
classB<Adeftest(a,b,c)#superwithnoargumentssuper()#shows[]
#superwithomittedarguments.Sameresultassuper(a,b,c)super#shows[1,2,3]
endend
B.new.test(1,2,3)
VisibilityInRuby,evenwhencallingthesamemethod,itcanbeorcannotbecalleddependingonthelocation(meaningtheobject).Thisfunctionalityisusuallycalled“visibility”(whetheritisvisible).InRuby,thebelowthreetypesofmethodscanbedefined.
public
private
protected
publicmethodscanbecalledfromanywhereinanyform.privatemethodscanonlybecalledinaform“syntactically”withoutareceiver.Ineffecttheycanonlybecalledbyinstancesoftheclassinwhichtheyweredefinedandininstancesofitssubclass.protectedmethodscanonlybecalledbyinstancesofthedefiningclassanditssubclasses.Itdiffersfromprivatethatmethodscanstillbecalledfromotherinstancesofthesameclass.
ThetermsarethesameasinC++butthemeaningisslightlydifferent.Becareful.
Usuallywecontrolvisibilityasshownbelow.
classC
publicdefa1()end#becomespublicdefa2()end#becomespublic
privatedefb1()end#becomesprivatedefb2()end#becomesprivate
protecteddefc1()end#becomesprotecteddefc2()end#becomesprotectedend
Herepublic,privateand`protectedaremethodcallswithoutparentheses.Thesearen’tevenreservedwords.
publicandprivatecanalsobeusedwithanargumenttosetthevisibilityofaparticularmethod.Butitsmechanismisnotinteresting.We’llleavethisout.
ModulefunctionsGivenamodule‘M’.Iftherearetwomethodswiththeexactsamecontent
M.method_name
M#method_name(Visibilityisprivate)
thenwecallthisamodulefunction.
Itisnotapparentwhythisshouldbeuseful.Butlet’slookatthenextexamplewhichishappilyused.
Math.sin(5)#Ifusedforafewtimesthisismoreconvenient
includeMathsin(5)#Ifusedmoreoftenthisismorepractical
It’simportantthatbothfunctionshavethesamecontent.Withadifferentselfbutwiththesamecodethebehaviorshouldstillbethesame.Instancevariablesbecomeextremelydifficulttouse.Hencesuchmethodisverylikelyamethodinwhichonlyproceduresarewritten(likesin).That’swhytheyarecalledmodule“functions”.
Iterators
Ruby’siteratorsdifferabitfromJava’sorC++’siteratorclassesor‘Iterator’designpattern.Preciselyspeaking,thoseiteratorsarecalledexterioriterators,Ruby’siteratorsareinterioriterators.Regardingthis,it’sdifficulttounderstandfromthedefinitionsolet’sexplainitwithaconcreteexample.
arr=[0,2,4,6.8]
Thisarrayisgivenandwewanttoaccesstheelementsinorder.InCstylewewouldwritethefollowing.
i=0whilei<arr.lengthprintarr[i]
i+=1end
Usinganiteratorwecanwrite:
arr.eachdo|item|printitemend
Everythingfromeachdotoendisthecalltoaniteratormethod.Morepreciselyeachistheiteratormethodandbetweendoandendistheiteratorblock.Thepartbetweentheverticalbarsarecalledblockparameters,whichbecomevariablestoreceivetheparameterspassedfromtheiteratormethodtotheblock.
Sayingitalittleabstractly,aniteratorissomethinglikeapieceofcodewhichhasbeencutoutandpassed.Inourexamplethepieceprintitemhasbeencutoutandispassedtotheeachmethod.Theneachtakesalltheelementsofthearrayinorderandpassesthemtothecutoutpieceofcode.
Wecanalsothinktheotherwayround.Theotherpartsexceptprintitemarebeingcutoutandenclosedintotheeachmethod.
i=0whilei<arr.lengthprintarr[i]i+=1end
arr.eachdo|item|printitemend
Comparisonwithhigherorderfunctions
WhatcomesclosestinCtoiteratorsarefunctionswhichreceivefunctionpointers,itmeanshigherorderfunctions.ButtherearetwopointsinwhichiteratorsinRubyandhigherorderfunctionsinCdiffer.
Firstly,Rubyiteratorscanonlytakeoneblock.Forinstancewecan’tdothefollowing.
#Mistake.Severalblockscannotbepassed.array_of_array.eachdo|i|....enddo|j|....end
Secondly,Ruby’sblockscansharelocalvariableswiththecodeoutside.
lvar='ok'[0,1,2].eachdo|i|plvar#Canacceslocalvariableoutsidetheblock.end
That’swhereiteratorsareconvenient.
Butvariablescanonlybesharedwiththeoutside.Theycannotbesharedwiththeinsideoftheiteratormethod(e.g.each).Puttingit
intuitively,onlythevariablesintheplacewhichlooksofthesourcecodecontinuedarevisible.
BlockLocalVariablesLocalvariableswhichareassignedinsideablockstaylocaltothatblock,itmeanstheybecomeblocklocalvariables.Let’scheckitout.
[0].eachdoi=0pi#0end
Fornow,tocreateablock,weapplyeachonanarrayoflength1(Wecanfullyleaveouttheblockparameter).Inthatblock,theivariableisfirstassigned..meaningdeclared.Thismakesiblocklocal.
Itissaidblocklocal,soitshouldnotbeabletoaccessfromtheoutside.Let’stestit.
%ruby-e'[0].eachdoi=0endpi#Hereoccursanerror.'-e:5:undefinedlocalvariableormethod`i'for#<Object:0x40163a9c>(NameError)
Whenwereferencedablocklocalvariablefromoutsidetheblock,
surelyanerroroccured.Withoutadoubtitstayedlocaltotheblock.
Iteratorscanalsobenestedrepeatedly.Eachtimethenewblockcreatesanotherscope.
lvar=0[1].eachdovar1=1[2].eachdovar2=2[3].eachdovar3=3#Herelvar,var1,var2,var3canbeseenend#Herelvar,var1,var2canbeseenend#Herelvar,var1canbeseenend#Hereonlylvarcanbeseen
There’sonepointwhichyouhavetokeepinmind.Differingfromnowadays’majorlanguagesRuby’sblocklocalvariablesdon’tdoshadowing.ShadowingmeansforinstanceinCthatinthecodebelowthetwodeclaredvariablesiaredifferent.
{inti=3;printf("%d\n",i);/*3*/{inti=99;printf("%d\n",i);/*99*/}printf("%d\n",i);/*3(元に戻った)*/}
Insidetheblocktheiinsideovershadowstheioutside.That’swhyit’scalledshadowing.
ButwhathappenswithblocklocalvariablesofRubywherethere’snoshadowing.Let’slookatthisexample.
i=0pi#0[0].eachdoi=1pi#1endpi#1thechangeispreserved
Evenwhenweassigniinsidetheblock,ifthereisthesamenameoutside,itwouldbeused.Thereforewhenweassigntoinsidei,thevalueofoutsideiwouldbechanged.Onthispointtherecamemanycomplains:“Thisiserrorprone.Pleasedoshadowing.”Eachtimethere’snearlyflamingbuttillnownoconclusionwasreached.
ThesyntaxofiteratorsTherearesomesmallertopicsleft.
First,therearetwowaystowriteaniterator.Oneisthedo~endasusedabove,theotheroneistheenclosinginbraces.Thetwoexpressionsbelowhaveexactlythesamemeaning.
arr.eachdo|i|putsiend
arr.each{|i|#Theauthorlikesafourspaceindentationforputsi#aniteratorwithbraces.}
Butgrammaticallytheprecedenceisdifferent.Thebracesbindmuchstrongerthando~end.
mmdo....end#m(m)do....endmm{....}#m(m(){....})
Anditeratorsaredefinitelymethods,sotherearealsoiteratorsthattakearguments.
re=/^\d/#regularexpressiontomatchadigitatthebeginningoftheline$stdin.grep(re)do|line|#lookrepeatedlyforthisregularexpression....end
yield
Ofcourseuserscanwritetheirowniterators.Methodswhichhaveayieldintheirdefinitiontextareiterators.Let’strytowriteaniteratorwiththesameeffectasArray#each:
#addingthedefinitiontotheArrayclassclassArraydefmy_eachi=0whilei<self.lengthyieldself[i]i+=1endendend
#thisistheoriginaleach[0,1,2,3,4].eachdo|i|piend
#my_eachworksthesame[0,1,2,3,4].my_eachdo|i|piend
yieldcallstheblock.Atthispointcontrolispassedtotheblock,whentheexecutionoftheblockfinishesitreturnsbacktothesamelocation.Thinkaboutitlikeacharacteristicfunctioncall.Whenthepresentmethoddoesnothaveablockaruntimeerrorwilloccur.
%ruby-e'[0,1,2].each'-e:1:in`each':noblockgiven(LocalJumpError)from-e:1
Proc
Isaid,thatiteratorsarelikecutoutcodewhichispassedasanargument.Butwecanevenmoredirectlymakecodetoanobjectandcarryitaround.
twice=Proc.new{|n|n*2}ptwice.call(9)#18willbeprinted
Inshort,itislikeafunction.Asmightbeexpectedfromthefactitiscreatedwithnew,thereturnvalueofProc.newisaninstanceoftheProcclass.
Proc.newlookssurelylikeaniteratoranditisindeedso.Itisanordinaryiterator.There’sonlysomemysticmechanisminsideProc.newwhichturnsaniteratorblockintoanobject.
BesidesthereisafunctionstylemethodlambdaprovidedwhichhasthesameeffectasProc.new.Choosewhateversuitsyou.
twice=lambda{|n|n*2}
IteratorsandProcWhydidwestarttalkingallofasuddenaboutProc?BecausethereisadeeprelationshipbetweeniteratorsandProc.Infact,iteratorblocksandProcobjectsarequitethesamething.That’swhyonecanbetransformedintotheother.
First,toturnaniteratorblockintoaProcobjectonehastoputan&infrontoftheparametername.
defprint_block(&block)pblockend
print_block()doend#Showssomethinglike<Proc:0x40155884>print_block()#Withoutablocknilisprinted
Withan&infrontoftheargumentname,theblockistransformedtoaProcobjectandassignedtothevariable.Ifthemethodisnotaniterator(there’snoblockattached)nilisassigned.
Andintheotherdirection,ifwewanttopassaProctoaniteratorwealsouse&.
block=Proc.new{|i|pi}[0,1,2].each(&block)
Thiscodemeansexactlythesameasthecodebelow.
[0,1,2].each{|i|pi}
Ifwecombinethesetwo,wecandelegateaniteratorblocktoamethodsomewhereelse.
defeach_item(&block)[0,1,2].each(&block)end
each_itemdo|i|#sameas[0,1,2].eachdo|i|piend
Expressions
“Expressions”inRubyarethingswithwhichwecancreateotherexpressionsorstatementsbycombiningwiththeothers.Forinstanceamethodcallcanbeanothermethodcall’sargument,soitisanexpression.Thesamegoesforliterals.Butliteralsandmethodcallsarenotalwayscombinationsofelements.Onthe
contrary,“expressions”,whichI’mgoingtointroduce,alwaysconsistsofsomeelements.
if
Weprobablydonotneedtoexplaintheifexpression.Iftheconditionalexpressionistrue,thebodyisexecuted.AsexplainedinPart1,everyobjectexceptnilandfalseistrueinRuby.
ifcond0then....elsifcond1then....elsifcond2then....else....end
elsif/else-clausescanbeomitted.Eachthenaswell.Buttherearesomefinerrequirementsconcerningthen.Forthiskindofthing,lookingatsomeexamplesisthebestwaytounderstand.HereonlythingI’dsayisthatthebelowcodesarevalid.
#1#4ifcondthen.....endifcondthen....end#2ifcond;....end#5ifcond#3thenifcondthen;....end....end
AndinRuby,ifisanexpression,sothereisthevalueoftheentireifexpression.Itisthevalueofthebodywhereaconditionexpressionismet.Forexample,iftheconditionofthefirstifistrue,thevaluewouldbetheoneofitsbody.
p(iftruethen1else2end)#=>1p(iffalsethen1else2end)#=>2p(iffalsethen1elsiftruethen2else3end)#=>2
Ifthere’snomatch,orthematchedclauseisempty,thevaluewouldbenil.
p(iffalsethen1end)#=>nilp(iftruethenend)#=>nil
unless
Anifwithanegatedconditionisanunless.Thefollowingtwoexpressionshavethesamemeaning.
unlesscondthenifnot(cond)then........endend
unlesscanalsohaveattachedelseclausesbutanyelsifcannotbeattached.Needlesstosay,thencanbeomitted.
unlessalsohasavalueanditsconditiontodecideiscompletelythesameasif.Itmeanstheentirevaluewouldbethevalueofthebodyofthematchedclause.Ifthere’snomatchorthematched
clauseisempty,thevaluewouldbenil.
and&&or||
Themostlikelyutilizationoftheandisprobablyabooleanoperation.Forinstanceintheconditionalexpressionofanif.
ifcond1andcond2puts'ok'end
ButasinPerl,shorLisp,itcanalsobeusedasaconditionalbranchexpression.Thetwofollowingexpressionshavethesamemeaning.
ifinvalid?(key)invalid?(key)andreturnnilreturnnilend
&&andandhavethesamemeaning.Differentisthebindingorder.
methodarg0&&arg1#method(arg0&&arg1)methodarg0andarg1#method(arg0)andarg1
Basicallythesymbolicoperatorcreatesanexpressionwhichcanbeanargument(arg).Thealphabeticaloperatorcreatesanexpressionwhichcannotbecomeanargument(expr).
Asforand,iftheevaluationofthelefthandsideistrue,therighthandsidewillalsobeevaluated.
Ontheotherhandoristheoppositeofand.Iftheevaluationofthe
lefthandsideisfalse,therighthandsidewillalsobeevaluated.
valid?(key)orreturnnil
orand||havethesamerelationshipas&&andand.Onlytheprecedenceisdifferent.
TheConditionalOperatorThereisaconditionaloperatorsimilartoC:
cond?iftrue:iffalse
Thespacebetweenthesymbolsisimportant.Iftheybumptogetherthefollowingweirdnesshappens.
cond?iftrue:iffalse#cond?(iftrue(:iffalse))
Thevalueoftheconditionaloperatoristhevalueofthelastexecutedexpression.Eitherthevalueofthetruesideorthevalueofthefalseside.
whileuntil
Here’sawhileexpression.
whileconddo....end
Thisisthesimplestloopsyntax.Aslongascondistruethebodyisexecuted.Thedocanbeomitted.
untilio_ready?(id)dosleep0.5end
untilcreatesaloopwhoseconditiondefinitionisopposite.Aslongastheconditionisfalseitisexecuted.Thedocanbeomitted.
Naturallythereisalsojumpsyntaxestoexitaloop.breakasinC/C++/Javaisalsobreak,butcontinueisnext.PerhapsnexthascomefromPerl.
i=0whiletrueifi>10break#exittheloopelsifi%2==0i*=2next#nextloopiterationendi+=1end
AndthereisanotherPerlism:theredo.
whilecond#(A)....redo....end
Itwillreturnto(A)andrepeatfromthere.Whatdiffersfromnextisitdoesnotcheckthecondition.
Imightcomeintotheworldtop100,iftheamountofRubyprogramswouldbecounted,butIhaven’tusedredoyet.ItdoesnotseemtobenecessaryafterallbecauseI’velivedhappilydespiteofit.
case
Aspecialformoftheifexpression.Itperformsbranchingonaseriesofconditions.Thefollowingleftandrightexpressionsareidenticalinmeaning.
casevaluewhencond1thenifcond1===value........whencond2thenelsifcond2===value........whencond3,cond4thenelsifcond3===valueorcond4===value........elseelse........endend
Thethreefoldequals===is,asthesameasthe==,actuallyamethodcall.Noticethatthereceiveristheobjectonthelefthandside.Concretely,ifitisthe===ofanArray,itwouldcheckifitcontainsthevalueasitselement.IfitisaHash,ittestswhetherithasthevalueasitskey.Ifitsisanregularexpression,ittestsifthevaluematches.Andsoon.Sincecasehasmanygrammaticalelements,to
listthemallwouldbetedious,thuswewillnotcovertheminthisbook.
ExceptionsThisisacontrolstructurewhichcanpassovermethodboundariesandtransmiterrors.ReaderswhoareacquaintedtoC++orJavawillknowaboutexceptions.Rubyexceptionsarebasicallythesame.
InRubyexceptionscomeintheformofthefunctionstylemethodraise.raiseisnotareservedword.
raiseArgumentError,"wrongnumberofargument"
InRubyexceptionareinstancesoftheExceptionclassandit’ssubclasses.Thisformtakesanexceptionclassasitsfirstargumentandanerrormessageasitssecondargument.IntheabovecaseaninstanceofArgumentErroriscreatedand“thrown”.Exceptionobjectwouldditchthepartaftertheraiseandstarttoreturnupwardsthemethodcallstack.
defraise_exceptionraiseArgumentError,"wrongnumberofargument"#thecodeaftertheexceptionwillnotbeexecutedputs'afterraise'endraise_exception()
Ifnothingblockstheexceptionitwillmoveonandonandfinallyit
willreachthetoplevel.Whenthere’snoplacetoreturnanymore,rubygivesoutamessageandendswithanon-zeroexitcode.
%rubyraise.rbraise.rb:2:in`raise_exception':wrongnumberofargument(ArgumentError)fromraise.rb:7
Howeveranexitwouldbesufficientforthis,andforanexceptionthereshouldbeawaytosethandlers.InRuby,begin~rescue~endisusedforthis.Itresemblesthetry~catchinC++andJava.
defraise_exceptionraiseArgumentError,"wrongnumberofargument"end
beginraise_exception()rescueArgumentError=>errthenputs'exceptioncatched'perrend
rescueisacontrolstructurewhichcapturesexceptions,itcatchesexceptionobjectsofthespecifiedclassanditssubclasses.Intheaboveexample,aninstanceofArgumentErrorcomesflyingintotheplacewhereArgumentErroristargeted,soitmatchesthisrescue.By=>errtheexceptionobjectwillbeassignedtothelocalvariableerr,afterthattherescuepartisexecuted.
%rubyrescue.rbexceptioncatched#<ArgumentError:wrongnumberofargument>
Whenanexceptionisrescued,itwillgothroughtherescueanditwillstarttoexecutethesubsequentasifnothinghappened,butwecanalsomakeitretryfromthebegin.Todoso,retryisused.
begin#theplacetoreturn....rescueArgumentError=>errthenretry#retryyourlifeend
Wecanomitthe=>errandthethenafterrescue.Wecanalsoleaveouttheexceptionclass.Inthiscase,itmeansasthesameaswhentheStandardErrorclassisspecified.
Ifwewanttocatchmoreexceptionclasses,wecanjustwritetheminline.Whenwewanttohandledifferenterrorsdifferently,wecanspecifyseveralrescueclauses.
beginraiseIOError,'portnotready'rescueArgumentError,TypeErrorrescueIOErrorrescueNameErrorend
Whenwritteninthisway,arescueclausethatmatchestheexceptionclassissearchedinorderfromthetop.Onlythematchedclausewillbeexecuted.Forinstance,onlytheclauseofIOErrorwillbeexecutedintheabovecase.
Ontheotherhand,whenthereisanelseclause,itisexecutedonlywhenthereisnoexception.
beginnil#OfcourseherewillnoerroroccurrescueArgumentError#Thispartwillnotbeexecutedelse#Thispartwillbeexecutedend
Moreoveranensureclausewillbeexecutedineverycase:whenthereisnoexception,whenthereisanexception,rescuedornot.
beginf=File.open('/etc/passwd')#dostuffensure#thispartwillbeexecutedanywayf.closeend
Bytheway,thisbeginexpressionalsohasavalue.Thevalueofthewholebegin~endexpressionisthevalueofthepartwhichwasexecutedlastamongbegin/rescue/elseclauses.Itmeansthelaststatementoftheclausesasidefromensure.Thereasonwhytheensureisnotcountedisprobablybecauseensureisusuallyusedforcleanup(thusitisnotamainline).
VariablesandConstantsReferringavariableoraconstant.Thevalueistheobjectthevariablepointsto.Wealreadytalkedintoomuchdetailaboutthevariousbehaviors.
lvar@ivar@@cvarCONST$gvar
Iwanttoaddonemorething.Amongthevariablesstartingwith$,therearespecialkinds.Theyarenotnecessarilyglobalvariablesandsomehavestrangenames.
FirstthePerlishvariables$_and$~.$_savesthereturnvalueofgetsandothermethods,$~containsthelastmatchofaregularexpression.Theyareincrediblevariableswhicharelocalvariablesandsimultaneouslythreadlocalvariables.
Andthe$!toholdtheexceptionobjectwhenanerrorisoccured,the$?toholdthestatusofachildprocess,the$SAFEtorepresentthesecuritylevel,theyareallthreadlocal.
AssignmentVariableassignmentsareallperformedby=.Allvariablesaretypeless.Whatissavedisareferencetoanobject.Asitsimplementation,itwasaVALUE(pointer).
var=1obj=Object.new@ivar='string'@@cvar=['array']PI=3.1415926535$gvar={'key'=>'value'}
However,asmentionedearlierobj.attr=valisnotanassignmentbutamethodcall.
SelfAssignmentvar+=1
ThissyntaxisalsoinC/C++/Java.InRuby,
var=var+1
itisashortcutofthiscode.DifferingfromC,theRuby+isamethodandthuspartofthelibrary.InC,thewholemeaningof+=isbuiltinthelanguageprocessoritself.AndinC++,+=and*=canbewhollyoverwritten,butwecannotdothisinRuby.InRuby+=isalwaysdefinedasanoperationofthecombinationof+andassignment.
Wecanalsocombineselfassignmentandanattribute-access-flavormethod.Theresultmorelookslikeanattribute.
classCdefi()@iend#Amethoddefinitioncanbewritteninoneline.defi=(n)@i=nendend
obj=C.newobj.i=1obj.i+=2#obj.i=obj.i+2pobj.i#3
Ifthereis+=theremightalsobe++butthisisnotthecase.Whyisthatso?InRubyassignmentisdealtwithonthelanguagelevel.Butontheotherhandmethodsareinthelibrary.Keepingthesetwo,theworldofvariablesandtheworldofobjects,strictlyapartisanimportantpeculiarityofRuby.If++wereintroducedtheseparationmighteasilybebroken.That’swhythere’sno++
Somepeopledon’twanttogowithoutthebrevityof++.Ithasbeenproposedagainandagaininthemailinglistbutwasalwaysturneddown.Iamalsoinfavorof++butnotasmuchasIcan’tdowithout,andIhavenotfeltsomuchneedsof++inRubyinthefirstplace,soI’vekeptsilentanddecidedtoforgetaboutit.
defined?
defined?isasyntaxofaquitedifferentcolorinRuby.Ittellswhetheranexpressionvalueis“defined”ornotatruntime.
var=1defined?(var)#=>true
Inotherwordsittellswhetheravaluecanbeobtainedfromtheexpressionreceivedasitsargument(isitokaytocallitso?)whentheexpressionisevaluated.Thatsaidbutofcourseyoucan’twriteanexpressioncausingaparseerror,anditcouldnotdetectiftheexpressionissomethingcontainingamethodcallwhichraisesanerrorinit.
Iwouldhavelovedtotellyoumoreaboutdefined?butitwillnot
appearagaininthisbook.Whatapity.
Statements
Astatementiswhatbasicallycannotbecombinedwiththeothersyntaxes,inotherwords,theyarelinedvertically.
Butitdoesnotmeanthere’snoevaluatedvalue.Forinstancetherearereturnvaluesforclassdefinitionstatementsandmethoddefinitionstatements.Howeverthisisrarelyrecommendedandisn’tuseful,you’dbetterregardthemlightlyinthisway.Herewealsoskipaboutthevalueofeachstatement.
TheEndingofastatementUptonowwejustsaid“Fornowoneline’sonestatement”.ButRuby’sstatementending’saren’tthatstraightforward.
FirstastatementcanbeendedexplicitlywithasemicolonasinC.Ofcoursethenwecanwritetwoandmorestatementsinoneline.
puts'Hello,World!';puts'Hello,Worldoncemore!'
Ontheotherhand,whentheexpressionapparentlycontinues,suchasjustafteropenedparentheses,dyadicoperators,oracomma,thestatementcontinuesautomatically.
#1+3*method(6,7+8)1+3*method(6,7+8)
Butit’salsototallynoproblemtouseabackslashtoexplicitlyindicatethecontinuation.
p1+\2
TheModifiersifandunlessTheifmodifierisanirregularversionofthenormalifTheprogramsontheleftandrightmeanexactlythesame.
on_true()ifcondifcondon_true()end
Theunlessisthenegativeversion.Guardstatements(statementswhichexcludeexceptionalconditions)canbeconvenientlywrittenwithit.
TheModifierswhileanduntilwhileanduntilalsohaveabacknotation.
process()whilehave_content?sleep(1)untilready?
Combiningthiswithbeginandendgivesado-while-looplikeinC.
beginres=get_response(id)endwhileneed_continue?(res)
ClassDefinitionclassC<SuperClass....end
DefinestheclassCwhichinheritsfromSuperClass
WetalkedquiteextensivelyaboutclassesinPart1.Thisstatementwillbeexecuted,theclasstobedefinedwillbecomeselfwithinthestatement,arbitraryexpressionscanbewrittenwithin.Classdefinitionscanbenested.TheyformthefoundationofRubyexecutionimage.
MethodDefinitiondefm(arg)end
I’vealreadywrittenaboutmethoddefinitionandwon’taddmore.Thissectionisputtomakeitclearthattheyalsobelongtostatements.
SingletonmethoddefinitionWealreadytalkedalotaboutsingletonmethodsinPart1.Theydonotbelongtoclassesbuttoobjects,infact,theybelongtosingletonclasses.Wedefinesingletonmethodsbyputtingthereceiverinfrontofthemethodname.Parameterdeclarationisdonethesamewaylikewithordinarymethods.
defobj.some_methodend
defobj.some_method2(arg1,arg2,darg=nil,*rest,&block)end
DefinitionofSingletonmethodsclass<<obj....end
Fromtheviewpointofpurposes,itisthestatementtodefinesomesingletonmethodsinabundle.Fromtheviewpointofmeasures,itisthestatementinwhichthesingletonclassofobjbecomesselfwhenexecuted.InallovertheRubyprogram,thisistheonlyplacewhereasingletonclassisexposed.
class<<objpself#=>#<Class:#<Object:0x40156fcc>>#SingletonClass「(obj)」defa()end#defobj.adefb()end#defobj.bend
MultipleAssignmentWithamultipleassignment,severalassignmentscanbedoneallatonce.Thefollowingisthesimplestcase:
a,b,c=1,2,3
It’sexactlythesameasthefollowing.
a=1b=2c=3
Justbeingconciseisnotinteresting.infact,whenanarraycomesintobemixed,itbecomessomethingfunforthefirsttime.
a,b,c=[1,2,3]
Thisalsohasthesameresultastheabove.Furthermore,therighthandsidedoesnotneedtobeagrammaticallistoraliteral.Itcanalsobeavariableoramethodcall.
tmp=[1,2,3]a,b,c=tmpret1,ret2=some_method()#some_methodmightprobablyreturnseveralvalues
Preciselyspeakingitisasfollows.Herewe’llassumeobjis(theobjectof)thevalueofthelefthandside,
1. objifitisanarray2. ifitsto_arymethodisdefined,itisusedtoconvertobjtoan
array.3. [obj]
Decidetheright-handsidebyfollowingthisprocedureandperformassignments.Itmeanstheevaluationoftheright-handsideandtheoperationofassignmentsaretotallyindependentfromeachother.
Anditgoeson,boththeleftandrighthandsidecanbeinfinitelynested.
a,(b,c,d)=[1,[2,3,4]]a,(b,(c,d))=[1,[2,[3,4]]](a,b),(c,d)=[[1,2],[3,4]]
Astheresultoftheexecutionofthisprogram,eachlinewillbea=1b=2c=3d=4.
Anditgoeson.Thelefthandsidecanbeindexorparameterassignments.
i=0arr=[]arr[i],arr[i+1],arr[i+2]=0,2,4parr#[0,2,4]
obj.attr0,obj.attr1,obj.attr2="a","b","c"
Andlikewithmethodparameters,*canbeusedtoreceiveinabundle.
first,*rest=0,1,2,3,4
pfirst#0prest#[1,2,3,4]
Whenallofthemareusedallatonce,it’sextremelyconfusing.
BlockparameterandmultipleassignmentWebrushedoverblockparameterswhenweweretalkingaboutiterators.Butthereisadeeprelationshipbetweenthemandmultipleassignment.Forinstanceinthefollowingcase.
array.eachdo|i|....end
Everytimewhentheblockiscalled,theyieldedargumentsaremulti-assignedtoi.Herethere’sonlyonevariableonthelefthandside,soitdoesnotlooklikemultiassignment.Butiftherearetwoormorevariables,itwouldalittlemorelooklikeit.Forinstance,Hash#eachisanrepeatedoperationonthepairsofkeysandvalues,sousuallywecallitlikethis:
hash.eachdo|key,value|....end
Inthiscase,eacharrayconsistofakeyandavalueisyieldedfromthehash.
Hencewecanalsodoesthefollowingthingbyusingnestedmultipleassignment.
#[[key,value],index]areyieldedhash.each_with_indexdo|(key,value),index|....end
alias
classCaliasneworigend
Defininganothermethodnewwiththesamebodyasthealreadydefinedmethodorig.aliasaresimilartohardlinksinaunixfilesystem.Theyareameansofassigningmultiplenamestoonemethodbody.Tosaythisinversely,becausethenamesthemselvesareindependentofeachother,evenifonemethodnameisoverwrittenbyasubclassmethod,theotheronestillremainswiththesamebehavior.
undef
classCundefmethod_nameend
ProhibitsthecallingofC#method_name.It’snotjustasimplerevokingofthedefinition.Ifthereevenwereamethodinthesuperclassitwouldalsobeforbidden.Inotherwordsthemethodisexchanged
forasignwhichsays“Thismethodmustnotbecalled”.
undefisextremelypowerful,onceitissetitcannotbedeletedfromtheRubylevelbecauseitisusedtocoverupcontradictionsintheinternalstructure.Onlyoneleftmeasureisinheritinganddefiningamethodinthelowerclass.Eveninthatcase,callingsuperwouldcauseanerroroccurring.
ThemethodwhichcorrespondstounlinkinafilesystemisModule#remove_method.Whiledefiningaclass,selfreferstothatclass,wecancallitasfollows(RememberthatClassisasubclassofModule.)
classCremove_method(:method_name)end
Butevenwitharemove_methodonecannotcanceltheundef.It’sbecausethesignputupbyundefprohibitsanykindofsearches.
((errata:Itcanberedefinedbyusingdef))
Somemoresmalltopics
Comments#examplesofbadcomments.
1+1#compute1+1.aliasmy_idid#my_idisanaliasofid.
Froma#totheendoflineisacomment.Itdoesn’thaveameaningfortheprogram.
Embeddeddocuments=beginThisisanembeddeddocument.It'ssocalledbecauseitisembeddedintheprogram.Plainandsimple.=end
Anembeddeddocumentstretchesfroman=beginoutsideastringatthebeginningofalinetoa=end.Theinteriorcanbearbitrary.Theprogramignoresitasamerecomment.
Multi-bytestringsWhentheglobalvariable$KCODEissettoeitherEUC,SJISorUTF8,stringsencodedineuc-jp,shift_jis,orutf8respectivelycanbeusedinastringofadata.
Andiftheoption-Ke,-Ksor-KuisgiventotherubycommandmultibytestringscanbeusedwithintheRubycode.Stringliterals,regularexpressionsandevenoperatornamescancontainmultibytecharacters.Henceitispossibletodosomethinglikethis:
def表⽰(arg)putsarg
end
表⽰'にほんご'
ButIreallycannotrecommenddoingthingslikethat.
TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera
CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License
RubyHackingGuide
TranslatedbyVincentISAMBART&ocha-
Chapter9:yacccrash
course
Outline
ParserandscannerHowtowriteparsersforprogramminglanguageshasbeenanactiveareaofresearchforalongtime,andthereisaquitefirmestablishedtacticfordoingit.Ifwelimitourselvestoagrammarnottoostrange(orambiguous),wecansolvethisproblembyfollowingthismethod.
Thefirstpartconsistsinsplittingastringinalistofwords(ortokens).Thisiscalledascannerorlexer.Theterm“lexicalanalyzer”isalsoused,butistoocomplicatedtosaysowe’llusethenamescanner.
Whenspeakingaboutscanners,thecommonsensefirstsays“therearegenerallyspacesattheendofaword”.Andinpractice,itwasmadelikethisinmostprogramminglanguages,becauseit’stheeasiestway.
Therecanalsobeexceptions.Forexample,intheoldFortran,whitespacesdidnothaveanymeaning.Thismeansawhitespacedidnotendaword,andyoucouldputspacesinthenameofavariable.Howeverthatmadetheparsingverycomplicatedsothecompilervendors,onebyone,startedignoringthatstandard.FinallyFortran90followedthistrendandmadethefactthatwhitespaceshaveanimpactthestandard.
Bytheway,itseemsthereasonwhitespaceshadnotmeaninginFortran77wasthatwhenwritingprogramsonpunchcardsitwaseasytomakeerrorsinthenumberofspaces.
ListofsymbolsIsaidthatthescannerspitsoutalistofwords(tokens),but,tobeexact,whatthescannercreatesisalistof“symbols”,notwords.
Whataresymbols?Let’stakenumbersasanexample.Inaprogramminglanguage,1,2,3,99areall“numbers”.Theycanallbehandledthesamewaybythegrammar.Wherewecanwrite1,wecanalsowrite2or3.That’swhytheparserdoesnotneedtohandlethemindifferentways.Fornumbers,“number”isenough.
“number”,“identifier”andotherscanbegroupedtogetheras“symbol”.ButbecarefulnottomixthiswiththeSymbolclass.
Thescannerfirstsplitsthestringintowordsanddetermineswhatthesesymbolsare.Forexample,NUMBERorDIGITfornumbers,IDENTIFIERfornameslike“name”,IFforthereservedwordif.These
symbolsarethengiventothenextphase.
ParsergeneratorThelistofwordsandsymbolsspittedoutbythescanneraregoingtobeusedtoformatree.Thistreeiscalledasyntaxtree.
Thename“parser”isalsosometimesusedtoincludeboththescannerandthecreationofthesyntaxtree.However,wewillusethenarrowsenseof“parser”,thecreationofthesyntaxtree.Howdoesthisparsermakeatreefromthelistofsymbols?Inotherwords,onwhatshouldwefocustofindthetreecorrespondingtoapieceofcode?
Thefirstwayistofocusonthemeaningofthewords.Forexample,let’ssupposewefindthewordvar.Ifthedefinitionofthelocalvariablevarhasbeenfoundbeforethis,we’llunderstandit’sthereadingofalocalvariable.
Anotherwaysistoonlyfocusonwhatwesee.Forexample,ifafteranidentifiedcomesa‘=’,we’llunderstandit’sanassignment.Ifthereservedwordifappears,we’llunderstandit’sthestartofanifstatement.
Thelatermethod,focusingonlyonwhatwesee,isthecurrenttrend.Inotherwordsthelanguagemustbedesignedtobeanalyzedjustbylookingatthelistofsymbols.Thechoicewasbecausethiswayissimpler,canbemoreeasilygeneralizedandcanthereforebe
automatizedusingtools.Thesetoolsarecalledparsergenerators.
ThemostusedparsergeneratorunderUNIXisyacc.Likemanyothers,ruby‘sparseriswrittenusingyacc.Theinputfileforthistoolisparser.y.That’swhytobeabletoreadruby’sparser,weneedtounderstandyacctosomeextent.(Note:Startingfrom1.9,rubyrequiresbisoninsteadofyacc.However,bisonismainlyyaccwithadditionalfunctionality,sothisdoesnotdiminishtheinterestofthischapter.)
Thischapterwillbeasimplepresentationofyacctobeabletounderstandparse.y,andthereforewewilllimitourselvestowhat’sneededtoreadparse.y.Ifyouwanttoknowmoreaboutparsersandparsergenerators,IrecommendyouabookIwrotecalled“Rubyを256倍使うための本無道編”(Thebooktouse256timesmoreofRuby-Unreasonablebook).IdonotrecommenditbecauseIwroteit,butbecauseinthisfieldit’stheeasiestbooktounderstand.Andbesidesit’scheapsostakeswillbelow.
Nevertheless,ifyouwouldlikeabookfromsomeoneelse(orcan’treadJapanese),IrecommendO’Reilly’s“lex&yaccprogramming”byJohnR.Levine,TonyMasonandDougBrown.Andifyourarestillnotsatisfied,youcanalsoread“Compilers”(alsoknownasthe“dragonbook”becauseofthedragononitscover)byAlfredV.Aho,RaviSethiandJeffreyD.Ullman.
Grammar
GrammarfileTheinputfileforyacciscalled“grammarfile”,asit’sthefilewherethegrammariswritten.Theconventionistonamethisgrammarfile*.y.ItwillbegiventoyaccwhowillgenerateCsourcecode.Thisfilecanthenbecompiledasusual(figure1showsthefullprocess).
Figure1:Filedependencies
Theoutputfilenameisalwaysy.tab.candcan’tbechanged.Therecentversionsofyaccusuallyallowtochangeitonthecommandline,butforcompatibilityitwassafertokeepy.tab.c.Bytheway,itseemsthetabofy.tab.ccomesfromtable,aslotsofhugetablesaredefinedinit.It’sgoodtohavealookatthefileonce.
Thegrammarfile’scontenthasthefollowingform:
▼Generalformofthegrammarfile
%{Header%}%union....%token....%type....
%%Rulespart%%Userdefinedpart
yacc‘sinputfileisfirstdividedin3partsby%%.Thefirstpartifcalledthedefinitionpart,hasalotofdefinitionsandsetups.Between%{and%}wecanwriteanythingwewantinC,likeforexamplenecessarymacros.Afterthat,theinstructionsstartingwith%arespecialyaccinstructions.Everytimeweuseone,we’llexplainit.
Themiddlepartofthefileiscalledtherulespart,andisthemostessentialpartforyacc.It’swhereiswrittenthegrammarwewanttoparse.We’llexplainitindetailsinthenextsection.
Thelastpartofthefile,theuserdefinedpart,canbeusedfreelybytheuser.yaccjustcopiesthispartverbatimintheoutputfile.It’susedforexampletoputauxiliaryroutinesneededbytheparser.
Whatdoesyaccdo.Whatyacctakescareofismainlythisrulespartinthemiddle.yacc
takesthegrammarwrittenthereanduseittomakeafunctioncalledyyparse().It’stheparser,inthenarrowsenseoftheword.
Inthenarrowsense,soitmeansascannerisneeded.However,yaccwon’ttakecareofit,itmustbedonebytheuser.Thescanneristhefunctionnamedyylex().
Evenifyacccreatesyyparse(),itonlytakescareofitscorepart.The“actions”we’llmentionlaterisoutofitsscope.Youcanthinkthepartdonebyyaccistoosmall,butthat’snotthecase.That’sbecausethis“corepart”isoverlyimportantthatyaccsurvivedtothisdayeventhoughwekeepcomplainingaboutit.
Butwhatonearthisthiscorepart?That’swhatwe’regoingtosee.
BNFWhenwewanttowriteaparserinC,itscodewillbe“cutthestringthisway,makethisanifstatement…”Whenusingparsergenerators,wesaytheopposite,thatis“Iwouldliketoparsethisgrammar.”Doingthiscreatesforusaparsertohandlethegrammar.Thismeanstellingthespecificationgivesustheimplementation.That’stheconvenientpointofyacc.
Buthowcanwetellthespecification?Withyacc,themethodofdescriptionusedistheBNF(Backus-NaurForm).Let’slookataverysimpleexample.
if_stmt:IFexprTHENstmtEND
Let’sseeseparatelywhat’sattheleftandattherightofthe“:”.Thepartontheleftside,if_stmt,isequaltotherightpart…iswhatImeanhere.Inotherwords,I’msayingthat:
if_stmtandIFexprTHENstmtENDareequivalent.
Here,if_stmt,IF,expr…areall“symbols”.expristheabbreviationofexpression,stmtofstatement.Itmustbeforsurethedeclarationoftheifstatement.
Onedefinitioniscalledarule.Thepartattheleftof“:”iscalledtheleftsideandtherightpartcalledtherightside.Thisisquiteeasytoremember.
Butsomethingismissing.Wedonotwantanifstatementwithoutbeingabletouseelse.Andevenifwecouldwriteelse,havingtoalwayswritetheelseevenwhenit’suselesswouldbecumbersome.Inthiscasewecoulddothefollowing:
if_stmt:IFexprTHENstmtEND|IFexprTHENstmtELSEstmtEND
“|”means“or”.
if_stmtiseither“IFexprTHENstmtEND”or“`IFexprTHENstmtELSEstmtEND`”.
That’sit.
HereIwouldlikeyoutopayattentiontothesplitdonewith|.Withjustthis,onemoreruleisadded.Infact,punctuatingwith|isjustashorterwaytorepeattheleftside.Thepreviousexamplehasexactlythesamemeaningasthefollowing:
if_stmt:IFexprTHENstmtENDif_stmt:IFexprTHENstmtELSEstmtEND
Thismeanstworulesaredefinedintheexample.
Thisisnotenoughtocompletethedefinitionoftheifstatement.That’sbecausethesymbolsexprandstmtarenotsentbythescanner,theirrulesmustbedefined.TobeclosertoRuby,let’sboldlyaddsomerules.
stmt:if_stmt|IDENTIFIER'='expr/*assignment*/|expr
if_stmt:IFexprTHENstmtEND|IFexprTHENstmtELSEstmtEND
expr:IDENTIFIER/*readingavariable*/|NUMBER/*integerconstant*/|funcall/*FUNctionCALL*/
funcall:IDENTIFIER'('args')'
args:expr/*onlyoneparameter*/
Iusedtwonewelements.First,commentsofthesameformasinC,andcharacterexpressedusing'='.This'='isalsoofcourseasymbol.Symbolslike“=”aredifferentfromnumbersasthereis
onlyonevarietyforthem.That’swhyforsymbolswherecanalsouse'='.Itwouldbegreattobeabletouseforstringsfor,forexample,reservedwords,butduetolimitationsoftheClanguagethiscannotbedone.
Weaddruleslikethis,tothepointwecompletewritingallthegrammar.Withyacc,theleftsideofthefirstwrittenruleis“thewholegrammarwewanttoexpress”.Sointhisexample,stmtexpressesthewholeprogram.
Itwasalittletooabstract.Let’sexplainthisalittlemoreconcretely.By“stmtexpressesthewholeprogram”,Imeanstmtandtherowsofsymbolsexpressedasequivalentbytherules,areallrecognizedasgrammar.Forexample,stmtandstmtareequivalent.Ofcourse.Thenexprisequivalenttostmt.That’sexpressedlikethisintherule.Then,NUMBERandstmtareequivalent.That’sbecauseNUMBERisexprandexprisstmt.
Wecanalsosaythatmorecomplicatedthingsareequivalent.
stmt↓if_stmt↓IFexprTHENstmtEND↓↓IFIDENTIFIERTHENexprEND↓IFIDENTIFIERTHENNUMBEREND
Whenithasexpandeduntilhere,allelementsbecomethesymbols
sentbythescanner.Itmeanssuchsequenceofsymbolsiscorrectasaprogram.Orputtingittheotherwayaround,ifthissequenceofsymbolsissentbythescanner,theparsercanunderstanditintheoppositeorderofexpanding.
IFIDENTIFIERTHENNUMBEREND↓IFIDENTIFIERTHENexprEND↓↓IFexprTHENstmtEND↓if_stmt↓stmt
Andstmtisasymbolexpressingthewholeprogram.That’swhythissequenceofsymbolsisacorrectprogramfortheparser.Whenit’sthecase,theparsingroutineyyparse()endsreturning0.
Bytheway,thetechnicaltermexpressingthattheparsersucceededisthatit“accepted”theinput.Theparserislikeagovernmentoffice:ifyoudonotfillthedocumentsintheboxesexactlylikeheaskedyouto,he’llrefusethem.Theacceptedsequencesofsymbolsaretheonesforwhichtheboxeswherefilledcorrectly.Parserandgovernmentofficearestrangelysimilarforinstanceinthefactthattheycareaboutdetailsinspecificationandthattheyusecomplicatedterms.
Terminalsymbolsandnonterminalsymbols
Well,intheconfusionofthemomentIusedwithoutexplainingittheexpression“symbolscomingfromthescanner”.Solet’sexplainthis.Iuseoneword“symbol”buttherearetwotypes.
Thefirsttypeofthesymbolsaretheonessentbythescanner.Theyareforexample,IF,THEN,END,'=',…Theyarecalledterminalsymbols.That’sbecauselikebeforewhenwedidthequickexpansionwefindthemalignedattheend.Inthischapterterminalsymbolsarealwayswrittenincapitalletters.However,symbolslike'='betweenquotesarespecial.Symbolslikethisareallterminalsymbols,withoutexception.
Theothertypeofsymbolsaretheonesthatnevercomefromthescanner,forexampleif_stmt,exprorstmt.Theyarecallednonterminalsymbols.Astheydon’tcomefromthescanner,theyonlyexistintheparser.Nonterminalsymbolsalsoalwaysappearatonemomentortheotherastheleftsideofarule.Inthischapter,nonterminalsymbolsarealwayswritteninlowercaseletters.
HowtotestI’mnowgoingtotellyouthewaytoprocessthegrammarfilewithyacc.
%tokenABCDE%%list:ABC|de
de:DE
First,putallterminalsymbolsusedafter%token.However,youdonothavetotypethesymbolswithquotes(like'=').Then,put%%tomarkachangeofsectionandwritethegrammar.That’sall.
Let’snowprocessthis.
%yaccfirst.y%lsfirst.yy.tab.c%
LikemostUnixtools,“silencemeanssuccess”.
There’salsoimplementationsofyaccthatneedsemicolonsattheendof(groupsof)rules.Whenit’sthecaseweneedtodothefollowing:
%tokenABCDE%%list:ABC|de;
de:DE;
IhatethesesemicolonssointhisbookI’llneverusethem.
VoidrulesLet’snowlookalittlemoreatsomeoftheestablishedwaysofgrammardescription.I’llfirstintroducevoidrules.
void:
There’snothingontherightside,thisruleis“void”.Forexample,thetwofollowingtargetsmeansexactlythesamething.
target:ABC
target:AvoidBvoidCvoid:
Whatistheuseofsuchathing?It’sveryuseful.Forexampleinthefollowingcase.
if_stmt:IFexprTHENstmtsopt_elseEND
opt_else:|ELSEstmts
Usingvoidrules,wecanexpresscleverlythefactthat“theelsesectionmaybeomitted”.Comparedtotherulesmadepreviouslyusingtwodefinitions,thiswayisshorterandwedonothavetodispersetheburden.
RecursivedefinitionsThefollowingexampleisstillalittlehardtounderstand.
list:ITEM/*rule1*/|listITEM/*rule2*/
Thisexpressesalistofoneormoreitems,inotherwordsanyof
thefollowinglistsofsymbols:
ITEMITEMITEMITEMITEMITEMITEMITEMITEMITEM:
Doyouunderstandwhy?First,accordingtorule1listcanbereadITEM.Ifyoumergethiswithrule2,listcanbeITEMITEM.
list:listITEM=ITEMITEM
WenowunderstandthatthelistofsymbolsITEMITEMissimilartolist.Byapplyingagainrule2tolist,wecansaythat3ITEMarealsosimilartolist.Byquicklycontinuingthisprocess,thelistcangrowtoanysize.Thisissomethinglikemathematicalinduction.
I’llnowshowyouthenextexample.Thefollowingexampleexpressesthelistswith0ormoreITEM.
list:|listITEM
Firstthefirstlinemeans“listisequivalentto(void)”.ByvoidImeanthelistwith0ITEM.Then,bylookingatrule2wecansaythat“listITEM”isequivalentto1ITEM.That’sbecauselistisequivalenttovoid.
list:listITEM
=(void)ITEM=ITEM
Byapplyingthesameoperationsofreplacementmultipletimes,wecanunderstandthatlististheexpressionalistof0ormoreitems.
Withthisknowledge,“listsof2ormoreITEM”or“listsof3ormoreITEM”areeasy,andwecanevencreate“listsofanevennumberofelements”.
list:|listITEMITEM
Constructionofvalues
ThisabstracttalklastedlongenoughsointhissectionI’dreallyliketogoonwithamoreconcretetalk.
ShiftandreduceUpuntilnow,variouswaystowritegrammarshavebeenexplained,butwhatwewantisbeingabletobuildasyntaxtree.However,I’mafraidtosay,onlytellingittherulesisnotenoughtobeabletoletitbuildasyntaxtree,asmightbeexpected.Therefore,thistime,I’lltellyouthewaytobuildasyntaxtreebyaddingsomethingtotherules.
We’llfirstseewhattheparserdoesduringtheexecution.We’llusethefollowingsimplegrammarasanexample.
%tokenABC%%program:ABC
Intheparserthereisastackcalledthesemanticstack.Theparserpushesonitallthesymbolscomingfromthescanner.Thismoveiscalled“shiftingthesymbols”.
[AB]←Cshift
Andwhenanyoftherightsideofaruleisequaltotheendofthestack,itis“interpreted”.Whenthishappens,thesequenceoftheright-handsideisreplacedbythesymboloftheleft-handside.
[ABC]↓reduction[program]
Thismoveiscalled“reduceABC”toprogram".Thistermisalittlepresumptuous,butinshortitislike,whenyouhaveenoughnumberoftilesofhakuandhatsuandchurespectively,itbecomes“Bigthreedragons”inJapaneseMahjong,…thismightbeirrelevant.
Andsinceprogramexpressesthewholeprogram,ifthere’sonlyaprogramonthestack,itprobablymeansthewholeprogramisfoundout.Therefore,iftheinputisjustfinishedhere,itisaccepted.
Let’strywithalittlemorecomplicatedgrammar.
%tokenIFESTHENEND%%program:if
if:IFexprTHENstmtsEND
expr:E
stmts:S|stmtsS
Theinputfromthescanneristhis.
IFETHENSSSEND
Thetransitionsofthesemanticstackinthiscaseareshownbelow.
Stack MoveemptyatfirstIF shiftIFIFE shiftEIFexpr reduceEtoexprIFexprTHEN shiftTHENIFexprTHENS shiftSIFexprTHENstmts reduceStostmtsIFexprTHENstmtsS shiftSIFexprTHENstmts reducestmtsStostmtsIFexprTHENstmtsS shiftSIFexprTHENstmts reducestmtsStostmtsIFexprTHENstmtsEND shiftENDif reduceIFexprTHENstmtsENDtoif
program reduceiftoprogramaccept.
Astheendofthissection,there’sonethingtobecautiouswith.areductiondoesnotalwaysmeansdecreasingthesymbols.Ifthere’savoidrule,it’spossiblethatasymbolisgeneratedoutof“void”.
ActionNow,I’llstarttodescribetheimportantparts.Whichevershiftingorreducing,doingseveralthingsonlyinsideofthesemanticstackisnotmeaningful.Sinceourultimategoalwasbuildingasyntaxtree,itcannotbesufficientwithoutleadingtoit.Howdoesyaccdoitforus?Theansweryaccmadeisthat“weshallenabletohookthemomentwhentheparserperformingareduction.”Thehooksarecalledactionsoftheparser.Anactioncanbewrittenatthelastoftheruleasfollows.
program:ABC{/*Hereisanaction*/}
Thepartbetween{and}istheaction.Ifyouwritelikethis,atthemomentreducingABCtoprogramthisactionwillbeexecuted.Whateveryoudoasanactionisfree.IfitisaCcode,almostallthingscanbewritten.
ThevalueofasymbolThisisfurthermoreimportantbut,eachsymbolhas“itsvalue”.
Bothterminalandnonterminalsymbolsdo.Asforterminalsymbols,sincetheycomefromthescanner,theirvaluesarealsogivenbythescanner.Forexample,1or9ormaybe108foraNUMBERsymbol.ForanIDENTIFIERsymbol,itmightbe"attr"or"name"or"sym".Anythingisfine.Eachsymbolanditsvaluearepushedtogetheronthesemanticstack.ThenextfigureshowsthestatejustthemomentSisshiftedwithitsvalue.
IFexprTHENstmtsSvaluevaluevaluevaluevalue
Accordingtothepreviousrule,stmtsScanbereducedtostmts.Ifanactioniswrittenattherule,itwouldbeexecuted,butatthatmoment,thevaluesofthesymbolscorrespondingtotheright-handsidearepassedtotheaction.
IFexprTHENstmtsS/*Stack*/v1v2v3v4v5↓↓stmts:stmtsS/*Rule*/↓↓{$1+$2;}/*Action*/
Thiswayanactioncantakethevalueofeachsymbolcorrespondingtotheright-handsideofarulethrough$1,$2,$3,…yaccwillrewritethekindsof$1and$2tothenotationtopointtothestack.HoweverbecauseitiswritteninClanguageitneedstohandle,forinstance,types,butbecauseitistiresome,let’sassumetheirtypesareofintforthemoment.
Next,insteaditwillpushthesymboloftheleft-handside,butbecauseallsymbolshavetheirvaluestheleft-handsidesymbolmustalsohaveitsvalue.Itisexpressedas$$inactions,thevalueof$$whenleavinganactionwillbethevalueoftheleft-handsidesymbol.
IFexprTHENstmtsS/*thestackjustbeforereducing*/v1v2v3v4v5↓↓stmts:stmtsS/*therulethattheright-handsidematchestheend*/↑↓↓{$$=$1+$2;}/*itsaction*/
IFexprTHENstmts/*thestackafterreducing*/v1v2v3(v4+v5)
Astheendofthissection,thisisjustanextra.Thevalueofasymbolissometimescalled“semanticvalue”.Thereforethestacktoputthemisthe“semanticvaluestack”,anditiscalled“semanticstack”forshort.
yaccandtypesIt’sreallycumbersomebutwithouttalkingabouttypeswecannotfinishthistalk.Whatisthetypeofthevalueofasymbol?Tosaythebottomlinefirst,itwillbethetypenamedYYSTYPE.ThismustbetheabbreviationofeitherYYStackTYPEorSemanticvalueTYPE.AndYYSTYPEisobviouslythetypedefofsomewhatanothertype.Thetypeistheuniondefinedwiththeinstructionnamed%unioninthedefinitionpart.
Wehavenotwritten%unionbeforebutitdidnotcauseanerror.Why?Thisisbecauseyaccconsideratelyprocesswiththedefaultvaluewithoutasking.ThedefaultvalueinCshouldnaturallybeint.Therefore,YYSTYPEisintbydefault.
Asforanexampleofayaccbookoracalculator,intcanbeusedunchanged.Butinordertobuildasyntaxtree,wewanttousestructsandpointersandtheothervariousthings.Thereforeforinstance,weuse%unionasfollows.
%union{structnode{inttype;structnode*left;structnode*right;}*node;intnum;char*str;}
Becausethisisnotforpracticaluse,thearbitrarynamesareusedfortypesandmembers.NoticethatitisdifferentfromtheordinalCbutthere’snosemicolonattheendofthe%uniconblock.
And,ifthisiswritten,itwouldlooklikethefollowinginy.tab.c.
typedefunion{structnode{inttype;structnode*left;structnode*right;}*node;intnum;char*str;
}YYSTYPE;
And,asforthesemanticstack,
YYSTYPEyyvs[256];/*thesubstanceofthestack(yyvs=YYValueStack)*/YYSTYPE*yyvsp=yyvs;/*thepointertotheendofthestack*/
wecanexpectsomethinglikethis.Therefore,thevaluesofthesymbolsappearinactionswouldbe
/*theactionbeforeprocessedbyyacc*/target:ABC{func($1,$2,$3);}
/*afterconverted,itsappearanceiny.tab.c*/{func(yyvsp[-2],yyvsp[-1],yyvsp[0]);;
naturallylikethis.
Inthiscase,becausethedefaultvalueintisused,itcanbeaccessedjustbyreferringtothestack.IfYYSTYPEisaunion,itisnecessarytoalsospecifyoneofitsmembers.Therearetwowaystodothat,onewayisassociatingwitheachsymbol,anotherwayisspecifyingeverytime.
Generally,thewayofassociatingwitheachtypeisused.Byusing%tokenforterminalsymbolsandusing%typefornonterminalsymbols,itiswrittenasfollows.
%token<num>ABC/*AllofthevaluesofABCisoftypeint*/%type<str>target/*Allofthevaluesoftargetisoftypechar**/
Ontheotherhand,ifyou’dliketospecifyeverytime,youcanwriteamembernameintonextto$asfollows.
%union{char*str;}%%target:{$<str>$="Inshort,thisisliketypecasting";}
You’dbetteravoidusingthismethodifpossible.Definingamemberforeachsymbolisbasic.
Couplingtheparserandthescannertogether
Afterall,I’vefinishedtotalkallaboutthisandthatofthevaluesinsidetheparser.Fortherest,I’lltalkingabouttheconnectingprotocolwiththescanner,thentheheartofthisstorywillbeallfinished.
First,we’dliketomakesurethatImentionedthatthescannerwastheyylex()function.each(terminal)symbolitselfisreturned(asint)asareturnvalueofthefunction.Sincetheconstantswiththesamenamesofsymbolsaredefined(#define)byyacc,wecanwriteNUMBERforaNUMBER.Anditsvalueispassedbyputtingitintoaglobalvariablenamedyylval.ThisyylvalisalsooftypeYYSTYPE,andtheexactlysamethingsastheparsercanbesaid.Inotherwords,ifitisdefinedin%unionitwouldbecomeaunion.Butthistimethememberisnotautomaticallyselected,itsmembernamehastobemanuallywritten.Theverysimpleexampleswouldlooklikethe
following.
staticintyylex(){yylval.str=next_token();returnSTRING;}
Figure2summarizestherelationshipsdescribedbynow.I’dlikeyoutocheckonebyone.yylval,$$,$1,$2…allofthesevariablesthatbecometheinterfacesareoftypeYYSTYPE.
Figure2:Relationshipsamongyaccrelatedvariables&functions
EmbeddedAction
Anactioniswrittenatthelastofarule,ishowitwasexplained.However,actuallyitcanbewritteninthemiddleofarule.
target:AB{puts("embeddedaction");}CD
Thisiscalled“embeddedaction”.Anembeddedactionismerelyasyntacticsugarofthefollowingdefinition:
target:ABdummyCD
dummy:/*voidrule*/{puts("embeddedaction");}
Fromthisexample,youmightbeabletotelleverythingincludingwhenitisexecuted.Thevalueofasymbolcanalsobetaken.Inotherwords,inthisexample,thevalueoftheembeddedactionwillcomeoutas$3.
PracticalTopics
ConflictsI’mnotafraidofyaccanymore.
Ifyouthoughtso,itistoonaive.Whyeveryoneisafraidsomuch
aboutyacc,thereasonisgoingtoberevealed.
Upuntilnow,Iwrotenotsocarefully“whentheright-handsideoftherulematchestheendofthestack”,butwhathappensifthere’sarulelikethis:
target:ABC|ABC
WhenthesequenceofsymbolsABCactuallycomesout,itwouldbehardtodeterminewhichistheruletomatch.Suchthingcannotbeinterpretedevenbyhumans.Thereforeyaccalsocannotunderstandthis.Whenyaccfindoutanoddgrammarlikethis,itwouldcomplainthatareduce/reduceconflictoccurs.Itmeansmultiplerulesarepossibletoreduceatthesametime.
%yaccrrconf.yconflicts:1reduce/reduce
Butusually,Ithinkyouwon’tdosuchthingsexceptasanaccident.Buthowaboutthenextexample?Thedescribedsymbolsequenceiscompletelythesame.
target:abc|Abc
abc:ABC
bc:BC
Thisisrelativelypossible.Especiallywheneachpartis
complicatedlymovedwhiledevelopingrules,itisoftenthecasethatthiskindofrulesaremadewithoutnoticing.
There’salsoasimilarpattern,asfollows:
target:abc|abC
abc:ABC
ab:AB
WhenthesymbolsequenceABCcomesout,it’shardtodeterminewhetheritshouldchooseoneabcorthecombinationofabandC.Inthiscase,yaccwillcomplainthatashift/reduceconflictoccurs.Thismeansthere’rebothashift-ableruleandareduce-ableruleatthesametime.
%yaccsrconf.yconflicts:1shift/reduce
Thefamousexampleofshift/reduceconflictsis“thehangingelseproblem”.Forexample,theifstatementofClanguagecausesthisproblem.I’lldescribeitbysimplifyingthecase:
stmt:expr';'|if
expr:IDENTIFIER
if:IF'('expr')'stmt|IF'('expr')'stmtELSEstmt
Inthisrule,theexpressionisonlyIDENTIFIER(variable),thesubstanceofifisonlyonestatement.Now,whathappensifthenextprogramisparsedwiththisgrammar?
if(cond)if(cond)true_stmt;elsefalse_stmt;
Ifitiswrittenthisway,wemightfeellikeit’squiteobvious.Butactually,thiscanbeinterpretedasfollows.
if(cond){if(cond)true_stmt;}else{false_stmt;}
Thequestionis“betweenthetwoifs,insideoneoroutsideoue,whichistheonetowhichtheelseshouldbeattached?”.
Howevershift/reduceconflictsarerelativelylessharmfulthanreduce/reduceconflicts,becauseusuallytheycanbesolvedbychoosingshift.Choosingshiftisalmostequivalentto“connectingtheelementsclosertoeachother”anditiseasytomatchhumaninstincts.Infact,thehangingelsecanalsobesolvedbyshiftingit.Hence,theyaccfollowsthistrend,itchosesshiftbydefaultwhenashift/reduceconflictoccurs.
Look-aheadAsanexperiment,I’dlikeyoutoprocessthenextgrammarwithyacc.
%tokenABC%%target:ABC/*rule1*/|AB/*rule2*/
Wecan’thelpexpectingthereshouldbeaconflict.AtthetimewhenithasreaduntilAB,therule1wouldattempttoshift,therule2wouldattempttoreduce.Inotherwords,thisshouldcauseashift/reduceconflict.However,….
%yaccconf.y%
It’sodd,there’snoconflict.Why?
Infact,theparsercreatedwithyacccanlookaheadonlyonesymbol.Beforeactuallydoingshiftorreduce,itcandecidewhattodobypeekingthenextsymbol.
Therefore,itisalsoconsideredforuswhengeneratingtheparser,iftherulecanbedeterminedbyasinglelook-ahead,conflictswouldbeavoided.Inthepreviousrules,forinstance,ifCcomesrightafterAB,onlytherule1ispossibleanditwouldbechose(shift).Iftheinputhasfinished,therule2wouldbechose(reduce).
Noticethattheword“look-ahead”hastwomeanings:onethingisthelook-aheadwhileprocessing*.ywithyacc.Theotherthingisthelook-aheadwhileactuallyexecutingthegeneratedparser.Thelook-aheadduringtheexecutionisnotsodifficult,butthelook-aheadofyaccitselfisprettycomplicated.That’sbecauseitneedstopredictallpossibleinputpatternsanddecidesitsbehaviorsfromonlythegrammarrules.
However,because“allpossible”isactuallyimpossible,ithandles“mostof”patterns.Howbroadrangeoverallpatternsitcancoverupshowsthestrengthofalook-aheadalgorithm.Thelook-aheadalgorithmthatyaccuseswhenprocessinggrammarfilesisLALR,whichisrelativelypowerfulamongcurrentlyexistingalgorithmstoresolveconflicts.
Alotthingshavebeenintroduced,butyoudon’thavetosoworrybecausewhattodointhisbookisonlyreadingandnotwriting.WhatIwantedtoexplainhereisnotthelook-aheadofgrammarsbutthelook-aheadduringexecutions.
OperatorPrecedenceSinceabstracttalkshavelastedforlong,I’lltalkmoreconcretely.Let’strytodefinetherulesforinfixoperatorssuchas+or*.Therearealsoestablishedtacticsforthis,we’dbettertamelyfollowit.Somethinglikeacalculatorforarithmeticoperationsisdefinedbelow:
expr:expr'+'expr|expr'-'expr|expr'*'expr|expr'/'expr|primary
primary:NUMBER|'('expr')'
primaryisthesmallestgrammarunit.Thepointisthatexprbetweenparenthesesbecomesaprimary.
Then,ifthisgrammariswrittentoanarbitraryfileandcompiled,theresultwouldbethis.
%yaccinfix.y16shift/reduceconflicts
Theyconflictaggressively.Thinkingfor5minutesisenoughtoseethatthisrulecausesaprobleminthefollowingandsimialrcases:
1-1-1
Thiscanbeinterpretedinbothofthenexttwoways.
(1-1)-11-(1-1)
Theformerisnaturalasannumericalexpression.Butwhatyaccdoesistheprocessoftheirappearances,theredoesnotcontainanymeanings.Asforthethingssuchasthemeaningthe-symbolhas,itisabsolutelynotconsideredatall.Inordertocorrectlyreflecta
humanintention,wehavetospecifywhatwewantstepbystep.
Then,whatwecandoiswritingthisinthedefinitionpart.
%left'+''-'%left'*''/'
Theseinstructionsspecifiesboththeprecedenceandtheassociativityatthesametime.I’llexplaintheminorder.
Ithinkthattheterm“precedence”oftenappearswhentalkingaboutthegrammarofaprogramminglanguage.Describingitlogicallyiscomplicated,soifIputitinstinctively,itisabouttowhichoperatorparenthesesareattachedinthefollowingandsimilarcases.
1+2*3
If*hashigherprecedence,itwouldbethis.
1+(2*3)
If+hashigherprecedence,itwouldbethis.
(1+2)*3
Asshownabove,resolvingshift/reduceconflictsbydefiningthestrongeronesandweakeronesamongoperatorsisoperatorprecedence.
However,iftheoperatorshasthesameprecedence,howcanitberesolved?Likethis,forinstance,
1-2-3
becausebothoperatorsare-,theirprecedencesarethecompletelysame.Inthiscase,itisresolvedbyusingtheassociativity.Associativityhasthreetypes:leftrightnonassoc,theywillbeinterpretedasfollows:
Associativity Interpretationleft(left-associative) (1–2)–3right(right-associative) 1–(2–3)nonassoc(non-associative) parseerror
Mostoftheoperatorsfornumericalexpressionsareleft-associative.Theright-associativeisusedmainlyfor=ofassignmentandnotofdenial.
a=b=1#(a=(b=1))notnota#(not(nota))
Therepresentativesofnon-associativeareprobablythecomparisonoperators.
a==b==c#parseerrora<=b<=c#parseerror
However,thisisnottheonlypossibility.InPython,forinstance,comparisonsbetweenthreetermsarepossible.
Then,thepreviousinstructionsnamed%left%right%noassocareusedtospecifytheassociativitiesoftheirnames.And,precedenceisspecifiedastheorderoftheinstructions.Thelowertheoperatorswritten,thehighertheprecedencestheyhave.Iftheyarewritteninthesameline,theyhavethesamelevelofprecedence.
%left'+''-'/*left-associativeandthirdprecedence*/%left'*''/'/*left-associativeandsecondprecedence*/%right'!'/*right-associativeandfirstprecedence*/
TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera
CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License
RubyHackingGuide
TranslatedbyRobertGRAVINA&ocha-
Chapter10:Parser
Outlineofthischapter
ParserconstructionThemainsourceoftheparserisparser.y.Becauseitis*.y,itistheinputforyaccandparse.cisgeneratedfromit.
Althoughonewouldexpectlex.ctocontainthescanner,thisisnotthecase.Thisfileiscreatedbygperf,takingthefilekeywordsasinput,anddefinesthereservedwordhashtable.Thistool-generatedlex.cis#includedin(thealsotool-generated)parse.c.Thedetailsofthisprocessissomewhatdifficulttoexplainatthistime,soweshallreturntothislater.
Figure1showstheparserconstructionprocess.ForthebenefitofthosereadersusingWindowswhomaynotbeaware,themv(move)commandcreatesanewcopyofafileandremovestheoriginal.ccis,ofcourse,theCcompilerandcpptheCpre-processor.
Figure1:Parserconstructionprocess
Dissectingparse.yLet’snowlookatparse.yinabitmoredetail.Thefollowingfigurepresentsaroughoutlineofthecontentsofparse.y.
▼parse.y
%{header%}%union....%token....%type....
%%
rules
%%usercodesectionparserinterfacescanner(characterstreamprocessing)syntaxtreeconstructionsemanticanalysislocalvariablemanagementIDimplementation
Asfortherulesanddefinitionspart,itisaspreviouslydescribed.Sincethispartisindeedtheheartoftheparser,I’llstarttoexplainitaheadoftheotherpartsinthenextsection.
Thereareaconsiderablenumberofsupportfunctionsdefinedintheusercodesection,butroughlyspeaking,theycanbedividedintothesixpartswrittenabove.Thefollowingtableshowswhereeachofpartsareexplainedinthisbook.
Part Chapter SectionParserinterface Thischapter Section3“Scanning”Scanner Thischapter Section3“Scanning”Syntaxtreeconstruction
Chapter12“Syntaxtreeconstruction”
Section2“Syntaxtreeconstruction”
Semanticanalysis Chapter12“Syntaxtreeconstruction”Section3“Semanticanalysis”
Localvariablemanagement
Chapter12“Syntaxtreeconstruction”
Section4“Localvariables”
IDimplementation
Chapter3“Namesandnametables”
Section2“IDandsymbols”
Generalremarksaboutgrammarrules
CodingrulesThegrammarofrubyconformstoacodingstandardandisthuseasytoreadonceyouarefamiliarwithit.
Firstly,regardingsymbolnames,allnon-terminalsymbolsarewritteninlowercasecharacters.Terminalsymbolsareprefixedbysomelowercasecharacterandthenfollowedbyuppercase.Reservedwords(keywords)areprefixedwiththecharacterk.Otherterminalsymbolsareprefixedwiththecharactert.
▼Symbolnameexamples
Token Symbolname(non-terminalsymbol) bodystmtif kIFdef kDEFrescue kRESCUEvarname tIDENTIFIERConstName tCONST1 tINTEGER
TheonlyexceptionstotheserulesareklBEGINandklEND.Thesesymbolnamesrefertothereservedwordsfor“BEGIN”and“END”,respectively,andthelherestandsforlarge.Sincethereservedwordsbeginandendalreadyexist(naturally,withsymbolnameskBEGINandkEND),thesenon-standardsymbolnameswererequired.
Importantsymbolsparse.ycontainsbothgrammarrulesandactions,however,fornowIwouldliketoconcentrateonthegrammarrulesalone.Thescriptsample/exyacc.rbcanbeusedtoextractthegrammarrulesfromthisfile.Asidefromthis,runningyacc-vwillcreatealogfiley.outputwhichalsocontainsthegrammarrules,howeveritisratherdifficulttoread.InthischapterIhaveusedaslightymodifiedversionofexyacc.rb\footnote{modifiedexyacc.rb:tools/exyacc2.rblocatedontheattachedCD-ROM}toextractthegrammarrules.
▼parse.y(rules)
program:compstmt
bodystmt:compstmtopt_rescueopt_elseopt_ensure
compstmt:stmtsopt_terms::
Theoutputisquitelong–over450linesofgrammarrules–andassuchIhaveonlyincludedthemostimportantpartsinthischapter.
Whichsymbols,then,arethemostimportant?Thenamessuchasprogram,expr,stmt,primary,argetc.arealwaysveryimportant.It’s
becausetheyrepresentthegeneralpartsofthegrammaticalelementsofaprogramminglanguage.Thefollowingtableoutlinestheelementsweshouldgenerallyfocusoninthesyntaxofaprogram.
Syntaxelement PredictedsymbolnamesProgram programprogfileinputstmts
wholeSentence statementstmt
Expression expressionexprexp
Smallestelement primaryprim
Lefthandsideofanexpression lhs(lefthandside)Righthandsideofanexpression rhs(righthandside)
Functioncall funcallfunction_callcallfunction
Methodcall methodmethod_callcall
Argument argumentarg
Functiondefinition defundefinitionfunctionfndef
Declarations declarationdecl
Ingeneral,programminglanguagestendtohavethefollowinghierarchystructure.
Programelement Properties
Program Usuallyalistofstatements
Statement Whatcannotbecombinedwiththeothers.Asyntaxtreetrunk.
Expression Whatisacombinationbyitselfandcanalsobeapartofanotherexpression.Asyntaxtreeinternalnode.
Primary Anelementwhichcannotbefurtherdecomposed.Asyntaxtreeleafnode.
ThestatementsarethingslikefunctiondefinitionsinCorclassdefinitionsinJava.Anexpressioncanbeaprocedurecall,anarithmeticexpressionetc.,whileaprimaryusuallyreferstoastringliteralornumber.Somelanguagesdonotcontainallofthesesymboltypes,howevertheygenerallycontainsomekindofhierarchyofsymbolssuchasprogram→stmt→expr→primary.
However,astructureatalowlevelcanbecontainedbyasuperiorstructure.Forexample,inCafunctioncallisanexpressionbutitcansolelybeput.Itmeansitisanexpressionbutitcanalsobeastatement.
Conversely,whensurroundedinparentheses,expressionsbecomeprimaries.Itisbecausethelowerthelevelofaelementthehighertheprecedenceithas.
Therangeofstatementsdifferconsiderablybetweenprogramminglanguages.Let’sconsiderassignmentasanexample.InC,becauseitispartofexpressions,wecanusethevalueofthewholeassignmentexpression.ButinPascal,assignmentisastatement,wecannotdosuchthing.Also,functionandclassdefinitionsaretypicallystatementshoweverinlanguagessuchasLispandScheme,sinceeverythingisanexpression,theydonothavestatementsinthefirstplace.RubyisclosetoLisp’sdesigninthisregard.
ProgramstructureNowlet’sturnourattentiontothegrammarrulesofruby.Firstly,inyacc,thelefthandsideofthefirstrulerepresentstheentiregrammar.Currently,itisprogram.Followingfurtherandfurtherfromhere,asthesameastheestablishedtactic,thefourprogramstmtexprprimarywillbefound.Withaddingargtothem,let’slookattheirrules.
▼rubygrammar(outline)
program:compstmt
compstmt:stmtsopt_terms
stmts:none|stmt|stmtstermsstmt
stmt:kALIASfitemfitem|kALIAStGVARtGVAR::|expr
expr:kRETURNcall_args|kBREAKcall_args::|'!'command_call|arg
arg:lhs'='arg|var_lhstOP_ASGNarg|primary_value'['aref_args']'tOP_ASGNarg::
|arg'?'arg':'arg|primary
primary:literal|strings::|tLPAREN_ARGexpr')'|tLPARENcompstmt')'::|kREDO|kRETRY
Ifwefocusonthelastruleofeachelement,wecanclearlymakeoutahierarchyofprogram→stmt→expr→arg→primary.
Also,we’dliketofocusonthisruleofprimary.
primary:literal::|tLPAREN_ARGexpr')'/*here*/
ThenametLPAREN_ARGcomesfromtforterminalsymbol,LforleftandPARENforparentheses–itistheopenparenthesis.Whythisisn’t'('iscoveredinthenextsection“Context-dependentscanner”.Anyway,thepurposeofthisruleisdemoteanexprtoaprimary.ThiscreatesacyclewhichcantheseeninFigure2,andthearrowshowshowthisruleisreducedduringparsing.
Figure2:exprdemotion
Thenextruleisalsoparticularlyinteresting.
primary:literal::|tLPARENcompstmt')'/*here*/
Acompstmt,whichequalstotheentireprogram(program),canbedemotedtoaprimarywiththisrule.Thenextfigureillustratesthisruleinaction.
Figure3:programdemotion
ThismeansthatforanysyntaxelementinRuby,ifwesurrounditwithparenthesisitwillbecomeaprimaryandcanbepassedasanargumenttoafunction,beusedastherighthandsideofanexpressionetc.Thisisanincrediblefact.Let’sactuallyconfirmit.
p((classC;end))p((defa()end))p((aliasaligets))p((iftruethennilelsenilend))p((1+1*1**1-1/1^1))
Ifweinvokerubywiththe-coption(syntaxcheck),wegetthefollowingoutput.
%ruby-cprimprog.rbSyntaxOK
Indeed,it’shardtobelievebut,itcouldactuallypass.Apparently,wedidnotgetthewrongidea.
Ifwecareaboutthedetails,sincetherearewhatrejectedbythesemanticanalysis(seealsoChapter12“Syntaxtreeconstruction”),itisnotperfectlypossible.Forexamplepassingareturnstatementasanargumenttoafunctionwillresultinanerror.Butatleastattheleveloftheoutlooks,the“surroundinganythinginparenthesismeansitcanbepassedasanargumenttoafunction”ruledoeshold.
InthenextsectionIwillcoverthecontentsoftheimportantelementsonebyone.
program
▼program
program:compstmt
compstmt:stmtsopt_terms
stmts:none|stmt|stmtstermsstmt
Asmentionedearlier,programrepresentstheentiregrammarthatmeanstheentireprogram.Thatprogramequalstocompstmts,andcompstmtsisalmostequivalenttostmts.Thatstmtsisalistofstmtsdelimitedbyterms.Hence,theentireprogramisalistofstmtsdelimitedbyterms.
termsis(ofcourse)anabbreviationfor“terminators”,thesymbolsthatterminatethesentences,suchassemicolonsornewlines.opt_termsmeans“OPTionalterms”.Thedefinitionsareasfollows:
▼opt_terms
opt_terms:|terms
terms:term|terms';'
term:';'|'\n'
Theinitial;or\nofatermscanbefollowedbyanynumberof;only;basedonthat,youmightstartthinkingthatifthereare2ormoreconsecutivenewlines,itcouldcauseaproblem.Let’stryandseewhatactuallyhappens.
1+1#firstnewline#secondnewline#thirdnewline1+1
Runthatwithruby-c.
%ruby-coptterms.rbSyntaxOK
Strange,itworked!Whatactuallyhappensisthis:consecutivenewlinesaresimplydiscardedbythescanner,whichreturnsonly
thefirstnewlineinaseries.
Bytheway,althoughwesaidthatprogramisthesameascompstmt,ifthatwasreallytrue,youwouldquestionwhycompstmtexistsatall.Actually,thedistinctionisthereonlyforexecutionofsemanticactions.programexiststoexecuteanysemanticactionswhichshouldbedoneonceintheprocessingofanentireprogram.Ifitwasonlyaquestionofparsing,programcouldbeomittedwithnoproblemsatall.
Togeneralizethispoint,thegrammarrulescanbedividedinto2groups:thosewhichareneededforparsingtheprogramstructure,andthosewhichareneededforexecutionofsemanticactions.Thenonerulewhichwasmentionedearlierwhentalkingaboutstmtsisanotheronewhichexistsforexecutingactions—it’susedtoreturnaNULLpointerforanemptylistoftypeNODE*.
stmt
Nextisstmt.Thisoneisratherinvolved,sowe’lllookintoitabitatatime.
▼stmt(1)
stmt:kALIASfitemfitem|kALIAStGVARtGVAR|kALIAStGVARtBACK_REF|kALIAStGVARtNTH_REF|kUNDEFundef_list|stmtkIF_MODexpr_value|stmtkUNLESS_MODexpr_value
|stmtkWHILE_MODexpr_value|stmtkUNTIL_MODexpr_value|stmtkRESCUE_MODstmt|klBEGIN'{'compstmt'}'|klEND'{'compstmt'}'
Lookingatthat,somehowthingsstarttomakesense.Thefirstfewhavealias,thenundef,thenthenextfewareallsomethingfollowedby_MOD—thoseshouldbestatementswithpostpositionmodifiers,asyoucanimagine.
expr_valueandprimary_valuearegrammarruleswhichexisttoexecutesemanticactions.Forexample,expr_valuerepresentsanexprwhichhasavalue.Expressionswhichdon’thavevaluesarereturnandbreak,orreturn/breakfollowedbyapostpositionmodifier,suchasanifclause.Foradetaileddefinitionofwhatitmeansto“haveavalue”,seechapter12,“SyntaxTreeConstruction”.Inthesameway,primary_valueisaprimarywhichhasavalue.
Asexplainedearlier,klBEGINandklENDrepresentBEGINandEND.
▼stmt(2)
|lhs'='command_call|mlhs'='command_call|var_lhstOP_ASGNcommand_call|primary_value'['aref_args']'tOP_ASGNcommand_call|primary_value'.'tIDENTIFIERtOP_ASGNcommand_call|primary_value'.'tCONSTANTtOP_ASGNcommand_call|primary_valuetCOLON2tIDENTIFIERtOP_ASGNcommand_call|backreftOP_ASGNcommand_call
Lookingattheserulesallatonceistherightapproach.Thecommonpointisthattheyallhavecommand_callontheright-handside.command_callrepresentsamethodcallwiththeparenthesesomitted.Thenewsymbolswhichareintroducedhereareexplainedinthefollowingtable.Ihopeyou’llrefertothetableasyoucheckovereachgrammarrule.
lhs thelefthandsideofanassignment(LeftHandSide)
mlhs thelefthandsideofamultipleassignment(MultipleLeftHandSide)
var_lhs thelefthandsideofanassignmenttoakindofvariable(VARiableLeftHandSide)
tOP_ASGN compoundassignmentoperatorlike+=or*=(OPeratorASsiGN)
aref_args argumenttoa[]methodcall(ArrayREFerence)tIDENTIFIER identifierwhichcanbeusedasalocalvariabletCONSTANT constantidentifier(withleadinguppercaseletter)tCOLON2 ::backref $1$2$3...
arefisaLispjargon.There’salsoasetastheothersideofapair,whichisanabbreviationof“arrayset”.Thisabbreviationisusedatalotofplacesinthesourcecodeofruby.
▼stmt(3)
|lhs'='mrhs_basic|mlhs'='mrhs
Thesetwoaremultipleassignments.mrhshasthesamestructureas
mlhsanditmeansmultiplerhs(therighthandside).We’vecometorecognizethatknowingthemeaningsofnamesmakesthecomprehensionmucheasier.
▼stmt(4)
|expr
Lastly,itjoinstoexpr.
expr
▼expr
expr:kRETURNcall_args|kBREAKcall_args|kNEXTcall_args|command_call|exprkANDexpr|exprkORexpr|kNOTexpr|'!'command_call|arg
Expression.Theexpressionofrubyisverysmallingrammar.That’sbecausethoseordinarycontainedinexpraremostlywentintoarg.Converselyspeaking,thosewhocouldnotgotoargarelefthere.Andwhatareleftare,again,methodcallswithoutparentheses.call_argsisanbareargumentlist,command_callis,aspreviouslymentioned,amethodwithoutparentheses.Ifthiskindofthingswascontainedinthe“small”unit,itwouldcauseconflicts
tremendously.
However,thesetwobelowareofdifferentkind.
exprkANDexprexprkORexpr
kANDis“and”,andkORis“or”.Sincethesetwohavetheirrolesascontrolstructures,theymustbecontainedinthe“big”syntaxunitwhichislargerthancommand_call.Andsincecommand_calliscontainedinexpr,atleasttheyneedtobeexprtogowell.Forexample,thefollowingusageispossible…
valid_items.include?argorraiseArgumentError,'invalidarg'#valid_items.include?(arg)orraise(ArgumentError,'invalidarg')
However,iftheruleofkORexistedinarginsteadofexpr,itwouldbejoinedasfollows.
valid_items.include?((argorraise))ArgumentError,'invalidarg'
Obviously,thiswouldendupaparseerror.
arg
▼arg
arg:lhs'='arg|var_lhstOP_ASGNarg|primary_value'['aref_args']'tOP_ASGNarg|primary_value'.'tIDENTIFIERtOP_ASGNarg
|primary_value'.'tCONSTANTtOP_ASGNarg|primary_valuetCOLON2tIDENTIFIERtOP_ASGNarg|backreftOP_ASGNarg|argtDOT2arg|argtDOT3arg|arg'+'arg|arg'-'arg|arg'*'arg|arg'/'arg|arg'%'arg|argtPOWarg|tUPLUSarg|tUMINUSarg|arg'|'arg|arg'^'arg|arg'&'arg|argtCMParg|arg'>'arg|argtGEQarg|arg'<'arg|argtLEQarg|argtEQarg|argtEQQarg|argtNEQarg|argtMATCHarg|argtNMATCHarg|'!'arg|'~'arg|argtLSHFTarg|argtRSHFTarg|argtANDOParg|argtOROParg|kDEFINEDopt_nlarg|arg'?'arg':'arg|primary
Althoughtherearemanyruleshere,thecomplexityofthegrammarisnotproportionatetothenumberofrules.Agrammarthatmerelyhasalotofcasescanbehandledveryeasilybyyacc,rather,thedepthorrecursiveoftheruleshasmoreinfluencesthe
complexity.
Then,itmakesuscuriousabouttherulesaredefinedrecursivelyintheformofargOPargattheplaceforoperators,butbecauseforalloftheseoperatorstheiroperatorprecedencesaredefined,thisisvirtuallyonlyamereenumeration.Let’scutthe“mereenumeration”outfromtheargrulebymerging.
arg:lhs'='arg/*1*/|primaryT_opeqarg/*2*/|argT_infixarg/*3*/|T_prearg/*4*/|arg'?'arg':'arg/*5*/|primary/*6*/
There’snomeaningtodistinguishterminalsymbolsfromlistsofterminalsymbols,theyareallexpressedwithsymbolswithT_.opeqisoperator+equal,T_prerepresentstheprepositionaloperatorssuchas'!'and'~',T_infixrepresentstheinfixoperatorssuchas'*'and'%'.
Toavoidconflictsinthisstructure,thingslikewrittenbelowbecomeimportant(but,thesedoesnotcoverall).
T_infixshouldnotcontain'='.
Sinceargspartiallyoverlapslhs,if'='iscontained,therule1andtherule3cannotbedistinguished.
T_opeqandT_infixshouldnothaveanycommonrule.
Sinceargscontainsprimary,iftheyhaveanycommonrule,therule2andtherule3cannotbedistinguished.
T_infixshouldnotcontain'?'.
Ifitcontains,therule3and5wouldproduceashift/reduceconflict.
T_preshouldnotcontain'?'or':'.
Ifitcontains,therule4and5wouldconflictinaverycomplicatedway.
Theconclusionisallrequirementsaremetandthisgrammardoesnotconflict.Wecouldsayit’samatterofcourse.
primary
Becauseprimaryhasalotofgrammarrules,we’llsplitthemupandshowtheminparts.
▼primary(1)
primary:literal|strings|xstring|regexp|words|qwords
Literals.literalisforSymbolliterals(:sym)andnumbers.
▼primary(2)
|var_ref|backref|tFID
Variables.var_refisforlocalvariablesandinstancevariablesandetc.backrefisfor$1$2$3…tFIDisfortheidentifierswith!or?,say,include?reject!.There’snopossibilityoftFIDbeingalocalvariable,evenifitappearssolely,itbecomesamethodcallattheparserlevel.
▼primary(3)
|kBEGINbodystmtkEND
bodystmtcontainsrescueandensure.Itmeansthisisthebeginoftheexceptioncontrol.
▼primary(4)
|tLPAREN_ARGexpr')'|tLPARENcompstmt')'
Thishasalreadydescribed.Syntaxdemoting.
▼primary(5)
|primary_valuetCOLON2tCONSTANT
|tCOLON3cname
Constantreferences.tCONSTANTisforconstantnames(capitalizedidentifiers).
BothtCOLON2andtCOLON3are::,buttCOLON3representsonlythe::whichmeansthetoplevel.Inotherwords,itisthe::of::Const.The::ofNet::SMTPistCOLON2.
Thereasonwhydifferentsymbolsareusedforthesametokenistodealwiththemethodswithoutparentheses.Forexample,itistodistinguishthenexttwofromeachother:
pNet::HTTP#p(Net::HTTP)pNet::HTTP#p(Net(::HTTP))
Ifthere’saspaceoradelimitercharactersuchasanopenparenthesisjustbeforeit,itbecomestCOLON3.Intheothercases,itbecomestCOLON2.
▼primary(6)
|primary_value'['aref_args']'
Index-formcalls,forinstance,arr[i].
▼primary(7)
|tLBRACKaref_args']'|tLBRACEassoc_list'}'
ArrayliteralsandHashliterals.ThistLBRACKrepresentsalso'[','['meansa'['withoutaspaceinfrontofit.Thenecessityofthisdifferentiationisalsoasideeffectofmethodcallswithoutparentheses.
Theterminalsymbolsofthisruleisveryincomprehensiblebecausetheydiffersinjustacharacter.Thefollowingtableshowshowtoreadeachtypeofparentheses,soI’dlikeyoutomakeuseofitwhenreading.
▼Englishnamesforeachparentheses
Symbol EnglishName() parentheses{} braces[] brackets
▼primary(8)
|kRETURN|kYIELD'('call_args')'|kYIELD'('')'|kYIELD|kDEFINEDopt_nl'('expr')'
Syntaxeswhoseformsaresimilartomethodcalls.Respectively,return,yield,defined?.
Thereargumentsforyield,butreturndoesnothaveanyarguments.Why?Thefundamentalreasonisthatyielditselfhasitsreturnvaluebutreturndoesnot.However,evenifthere’snot
anyargumentshere,itdoesnotmeanyoucannotpassvalues,ofcourse.Therewasthefollowingruleinexpr.
kRETURNcall_args
call_argsisabareargumentlist,soitcandealwithreturn1orreturnnil.Thingslikereturn(1)arehandledasreturn(1).Forthisreason,surroundingthemultipleargumentsofareturnwithparenthesesasinthefollowingcodeshouldbeimpossible.
return(1,2,3)#interpretedasreturn(1,2,3)andresultsinparseerror
Youcouldunderstandmoreaboutaroundhereifyouwillcheckthisagainafterreadingthenextchapter“Finite-StateScanner”.
▼primary(9)
|operationbrace_block|method_call|method_callbrace_block
Methodcalls.method_calliswitharguments(alsowithparentheses),operationiswithoutbothargumentsandparentheses,brace_blockiseither{~}ordo~endandifitisattachedtoamethod,themethodisaniterator.Forthequestion“Eventhoughitisbrace,whyisdo~endcontainedinit?”,there’sareasonthatismoreabyssalthanMarianTrench,butagaintheonlywaytounderstandisreadingthenextchapter“Finite-StateScanner”.
▼primary(10)
|kIFexpr_valuethencompstmtif_tailkEND#if|kUNLESSexpr_valuethencompstmtopt_elsekEND#unless|kWHILEexpr_valuedocompstmtkEND#while|kUNTILexpr_valuedocompstmtkEND#until|kCASEexpr_valueopt_termscase_bodykEND#case|kCASEopt_termscase_bodykEND#case(Form2)|kFORblock_varkINexpr_valuedocompstmtkEND#for
Thebasiccontrolstructures.Alittleunexpectedly,thingsappeartobethisbigareputinsideprimary,whichis“small”.Becauseprimaryisalsoarg,wecanalsodosomethinglikethis.
p(iftruethen'ok'end)#shows"ok"
Imentioned“almostallsyntaxelementsareexpressions”wasoneofthetraitsofRuby.Itisconcretelyexpressedbythefactthatifandwhileareinprimary.
Whyistherenoproblemifthese“big”elementsarecontainedinprimary?That’sbecausetheRuby’ssyntaxhasatraitthat“itbeginswiththeterminalsymbolAandendswiththeterminalsymbolB”.Inthenextsection,we’llthinkaboutthispointagain.
▼primary(11)
|kCLASScnamesuperclassbodystmtkEND#classdefinition|kCLASStLSHFTexprtermbodystmtkEND#singletonclassdefinition|kMODULEcnamebodystmtkEND#moduledefinition|kDEFfnamef_arglistbodystmtkEND#methoddefinition|kDEFsingletondot_or_colonfnamef_arglistbodystmtkEND#singletonmethoddefinition
Definitionstatements.I’vecalledthemtheclassstatementsandtheclassstatements,butessentiallyIshouldhavebeencalledthemtheclassprimaries,probably.Theseareallfitthepattern“beginningwiththeterminalsymbolAandendingwithB”,evenifsuchrulesareincreasedalotmore,itwouldneverbeaproblem.
▼primary(12)
|kBREAK|kNEXT|kREDO|kRETRY
Variousjumps.Theseare,well,notimportantfromtheviewpointofgrammar.
ConflictingListsIntheprevioussection,thequestion“isitallrightthatifisinsuchprimary?”wassuggested.Toproofpreciselyisnoteasy,butexplaininginstinctivelyisrelativelyeasy.Here,let’ssimulatewithasmallruledefinedasfollows:
%tokenABo%%element:Aitem_listB
item_list:|item_listitem
item:element
|o
elementistheelementthatwearegoingtoexamine.Forexample,ifwethinkaboutif,itwouldbeif.elementisalistthatstartswiththeterminalsymbolAandendswithB.Asforif,itstartswithifandendswithend.Theocontentsaremethodsorvariablereferencesorliterals.Foranelementofthelist,theoorelementisnesting.
Withtheparserbasedonthisgrammar,let’strytoparsethefollowinginput.
AAoooBoAoAoooBoBB
Theyarenestingtoomanytimesforhumanstocomprehendwithoutsomehelpssuchasindents.Butitbecomesrelativelyeasyifyouthinkinthenextway.Becauseit’scertainthatAandBwhichcontainonlyseveralobetweenthemaregoingtoappear,replacethemtoasingleowhentheyappear.Allwehavetodoisrepeatingthisprocedure.Figure4showstheconsequence.
Figure4:parsealistwhichstartswithAandendswithB
However,iftheendingBismissing,…
%tokenAo
%%element:Aitem_list/*Bisdeletedforanexperiment*/
item_list:|item_listitem
item:element|o
Iprocessedthiswithyaccandgot2shift/reduceconflicts.Itmeansthisgrammarisambiguous.IfwesimplytakeBoutfromthepreviousone,Theinputwouldbeasfollows.
AAooooAoAoooo
Thisishardtointerpretinanyway.However,therewasarulethat“chooseshiftifitisashift/reduceconflict”,let’sfollowitasanexperimentandparsetheinputwithshift(meaninginterior)whichtakesprecedence.(Figure5)
Figure5:parsealistoflistswhichstartwithA
Itcouldbeparsed.However,thisiscompletelydifferentfromtheintentionoftheinput,therebecomesnowaytosplitthelistinthemiddle.
Actually,themethodswithoutparenthesesofRubyisinthesimilarsituationtothis.It’snotsoeasytounderstandbutapairof
amethodnameanditsfirstargumentisA.Thisisbecause,sincethere’snocommaonlybetweenthetwo,itcanberecognizedasthestartofanewlist.
Also,the“practical”HTMLcontainsthispattern.Itis,forinstance,when</p>or</i>isomitted.That’swhyyacccouldnotbeusedforordinaryHTMLatall.
Scanner
ParserOutlineI’llexplainabouttheoutlineoftheparserbeforemovingontothescanner.TakealookatFigure6.
Figure6:ParserInterface(CallGraph)
Therearethreeofficialinterfacesoftheparser:rb_compile_cstr(),rb_compile_string(),rb_compile_file().TheyreadaprogramfromCstring,aRubystringobjectandaRubyIOobject,respectively,andcompileit.
Thesefunctions,directlyorindirectly,callyycompile(),andintheend,thecontrolwillbecompletelymovedtoyyparse(),whichisgeneratedbyyacc.Sincetheheartoftheparserisnothingbutyyparse(),it’snicetounderstandbyplacingyyparse()atthecenter.Inotherwords,functionsbeforemovingontoyyparse()areallpreparations,andfunctionsafteryyparse()aremerelychorefunctionsbeingpushedaroundbyyyparse().
Therestfunctionsinparse.yareauxiliaryfunctionscalledbyyylex(),andthesecanalsobeclearlycategorized.
First,theinputbufferisatthelowestlevelofthescanner.rubyisdesignedsothatyoucaninputsourceprogramsviabothRubyIOobjectsandstrings.Theinputbufferhidesthatandmakesitlooklikeasinglebytestream.
Thenextlevelisthetokenbuffer.Itreads1byteatatimefromtheinputbuffer,andkeepsthemuntilitwillformatoken.
Therefore,thewholestructureofyylexcanbedepictedasFigure7.
Figure7:Thewholepictureofthescanner
TheinputbufferLet’sstartwiththeinputbuffer.Itsinterfacesareonlythethree:nextc(),pushback(),peek().
Althoughthisissortofinsistent,Isaidthefirstthingistoinvestigatedatastructures.Thevariablesusedbytheinputbufferarethefollowings:
▼theinputbuffer
2279staticchar*lex_pbeg;2280staticchar*lex_p;2281staticchar*lex_pend;
(parse.y)
Thebeginning,thecurrentpositionandtheendofthebuffer.Apparently,thisbufferseemsasimplesingle-linestringbuffer(Figure8).
Figure8:Theinputbuffer
nextc()
Then,let’slookattheplacesusingthem.First,I’llstartwithnextc()thatseemsthemostorthodox.
▼nextc()
2468staticinlineint2469nextc()2470{2471intc;24722473if(lex_p==lex_pend){2474if(lex_input){2475VALUEv=lex_getline();24762477if(NIL_P(v))return-1;2478if(heredoc_end>0){2479ruby_sourceline=heredoc_end;2480heredoc_end=0;2481}2482ruby_sourceline++;2483lex_pbeg=lex_p=RSTRING(v)->ptr;2484lex_pend=lex_p+RSTRING(v)->len;2485lex_lastline=v;2486}2487else{2488lex_lastline=0;2489return-1;2490}2491}2492c=(unsignedchar)*lex_p++;2493if(c=='\r'&&lex_p<=lex_pend&&*lex_p=='\n'){2494lex_p++;2495c='\n';2496}24972498returnc;2499}
(parse.y)
Itseemsthatthefirstifistotestifitreachestheendoftheinputbuffer.And,theifinsideofitseems,sincetheelsereturns-1(EOF),totesttheendofthewholeinput.Converselyspeaking,whentheinputends,lex_inputbecomes0.((errata:itdoesnot.lex_inputwillneverbecome0duringordinaryscan.))
Fromthis,wecanseethatstringsarecomingbitbybitintotheinputbuffer.Sincethenameofthefunctionwhichupdatesthebufferislex_getline,it’sdefinitethateachlinecomesinatatime.
Hereisthesummary:
if(reachedtheendofthebuffer)if(stillthere'smoreinput)readthenextlineelsereturnEOFmovethepointerforwardskipreadingCRofCRLFreturnc
Let’salsolookatthefunctionlex_getline(),whichprovideslines.Thevariablesusedbythisfunctionareshowntogetherinthefollowing.
▼lex_getline()
2276staticVALUE(*lex_gets)();/*getsfunction*/2277staticVALUElex_input;/*non-nilifFile*/
2420staticVALUE
2421lex_getline()2422{2423VALUEline=(*lex_gets)(lex_input);2424if(ruby_debug_lines&&!NIL_P(line)){2425rb_ary_push(ruby_debug_lines,line);2426}2427returnline;2428}
(parse.y)
Exceptforthefirstline,thisisnotimportant.Apparently,lex_getsshouldbethepointertothefunctiontoreadaline,lex_inputshouldbetheactualinput.Isearchedtheplacewheresettinglex_getsandthisiswhatIfound:
▼setlex_gets
2430NODE*2431rb_compile_string(f,s,line)2432constchar*f;2433VALUEs;2434intline;2435{2436lex_gets=lex_get_str;2437lex_gets_ptr=0;2438lex_input=s;
2454NODE*2455rb_compile_file(f,file,start)2456constchar*f;2457VALUEfile;2458intstart;2459{2460lex_gets=rb_io_gets;2461lex_input=file;
(parse.y)
rb_io_gets()isnotaexclusivefunctionfortheparserbutoneofthegeneral-purposelibraryofRuby.ItisthefunctiontoreadalinefromanIOobject.
Ontheotherhand,lex_get_str()isdefinedasfollows:
▼lex_get_str()
2398staticintlex_gets_ptr;
2400staticVALUE2401lex_get_str(s)2402VALUEs;2403{2404char*beg,*end,*pend;24052406beg=RSTRING(s)->ptr;2407if(lex_gets_ptr){2408if(RSTRING(s)->len==lex_gets_ptr)returnQnil;2409beg+=lex_gets_ptr;2410}2411pend=RSTRING(s)->ptr+RSTRING(s)->len;2412end=beg;2413while(end<pend){2414if(*end++=='\n')break;2415}2416lex_gets_ptr=end-RSTRING(s)->ptr;2417returnrb_str_new(beg,end-beg);2418}
(parse.y)
lex_gets_ptrrememberstheplaceithavealreadyread.Thismovesittothenext\n,andsimultaneouslycutoutattheplaceandreturnit.
Here,let’sgobacktonextc.Asdescribed,bypreparingthetwofunctionswiththesameinterface,itswitchthefunctionpointerwheninitializingtheparser,andtheotherpartisusedincommon.Itcanalsobesaidthatthedifferenceofthecodeisconvertedtothedataandabsorbed.Therewasalsoasimilarmethodofst_table.
pushback()
Withtheknowledgeofthephysicalstructureofthebufferandnextc,wecanunderstandtheresteasily.pushback()writesbackacharacter.IfputitinC,itisungetc().
▼pushback()
2501staticvoid2502pushback(c)2503intc;2504{2505if(c==-1)return;2506lex_p--;2507}
(parse.y)
peek()
peek()checksthenextcharacterwithoutmovingthepointerforward.
▼peek()
2509#definepeek(c)(lex_p!=lex_pend&&(c)==*lex_p)
(parse.y)
TheTokenBufferThetokenbufferisthebufferofthenextlevel.Itkeepsthestringsuntilatokenwillbeabletocutout.Therearethefiveinterfacesasfollows:
newtok beginanewtokentokadd addacharactertothebuffertokfix fixatokentok thepointertothebeginningofthebufferedstringtoklen thelengthofthebufferedstringtoklast thelastbyteofthebufferedstring
Now,we’llstartwiththedatastructures.
▼TheTokenBuffer
2271staticchar*tokenbuf=NULL;2272staticinttokidx,toksiz=0;
(parse.y)
tokenbufisthebuffer,tokidxistheendofthetoken(sinceitisofint,itseemsanindex),andtoksizisprobablythebufferlength.Thisisalsosimplystructured.Ifdepictingit,itwouldlooklikeFigure9.
Figure9:Thetokenbuffer
Let’scontinuouslygototheinterfaceandreadnewtok(),whichstartsanewtoken.
▼newtok()
2516staticchar*2517newtok()2518{2519tokidx=0;2520if(!tokenbuf){2521toksiz=60;2522tokenbuf=ALLOC_N(char,60);2523}2524if(toksiz>4096){2525toksiz=60;2526REALLOC_N(tokenbuf,char,60);2527}2528returntokenbuf;2529}
(parse.y)
Theinitializinginterfaceofthewholebufferdoesnotexist,it’spossiblethatthebufferisnotinitialized.Therefore,thefirstifchecksitandinitializesit.ALLOC_N()isthemacrorubydefinesandisalmostthesameascalloc.
Theinitialvalueoftheallocatinglengthis60,andifitbecomestoobig(>4096),itwouldbereturnedbacktosmall.Sinceatokenbecomingthislongisunlikely,thissizeisrealistic.
Next,let’slookatthetokadd()toaddacharactertotokenbuffer.
▼tokadd()
2531staticvoid2532tokadd(c)2533charc;2534{2535tokenbuf[tokidx++]=c;2536if(tokidx>=toksiz){2537toksiz*=2;2538REALLOC_N(tokenbuf,char,toksiz);2539}2540}
(parse.y)
Atthefirstline,acharacterisadded.Then,itchecksthetokenlengthandifitseemsabouttoexceedthebufferend,itperformsREALLOC_N().REALLOC_N()isarealloc()whichhasthesamewayofspecifyingargumentsascalloc().
Therestinterfacesaresummarizedbelow.
▼tokfix()tok()toklen()toklast()
2511#definetokfix()(tokenbuf[tokidx]='\0')2512#definetok()tokenbuf2513#definetoklen()tokidx2514#definetoklast()(tokidx>0?tokenbuf[tokidx-1]:0)
(parse.y)
There’sprobablynoquestion.
yylex()
yylex()isverylong.Currently,therearemorethan1000lines.Themostofthemisoccupiedbyahugeswitchstatement,itbranchesbasedoneachcharacter.First,I’llshowthewholestructurethatsomepartsofitareleftout.
▼yylexoutline
3106staticint3107yylex()3108{3109staticIDlast_id=0;3110registerintc;3111intspace_seen=0;3112intcmd_state;31133114if(lex_strterm){/*...stringscan...*/3131returntoken;3132}3133cmd_state=command_start;3134command_start=Qfalse;3135retry:3136switch(c=nextc()){3137case'\0':/*NUL*/3138case'\004':/*^D*/3139case'\032':/*^Z*/3140case-1:/*endofscript.*/3141return0;31423143/*whitespaces*/
3144case'':case'\t':case'\f':case'\r':3145case'\13':/*'\v'*/3146space_seen++;3147gotoretry;31483149case'#':/*it'sacomment*/3150while((c=nextc())!='\n'){3151if(c==-1)3152return0;3153}3154/*fallthrough*/3155case'\n':/*...omission...*/
casexxxx::break;:/*branchesalotforeachcharacter*/::4103default:4104if(!is_identchar(c)||ISDIGIT(c)){4105rb_compile_error("Invalidchar`\\%03o'inexpression",c);4106gotoretry;4107}41084109newtok();4110break;4111}
/*...dealwithordinaryidentifiers...*/}
(parse.y)
Asforthereturnvalueofyylex(),zeromeansthattheinputhasfinished,non-zeromeansasymbol.
Becarefulthataextremelyconcisevariablenamed“c”isusedall
overthisfunction.space_seen++whenreadingaspacewillbecomehelpfullater.
Allithastodoastherestistokeepbranchingforeachcharacterandprocessingit,butsincecontinuousmonotonicprocedureislasting,itisboringforreaders.Therefore,we’llnarrowthemdowntoafewpoints.Inthisbooknotallcharacterswillbeexplained,butitiseasyifyouwillamplifythesamepattern.
'!'
Let’sstartwithwhatissimplefirst.
▼yylex–'!'
3205case'!':3206lex_state=EXPR_BEG;3207if((c=nextc())=='='){3208returntNEQ;3209}3210if(c=='~'){3211returntNMATCH;3212}3213pushback(c);3214return'!';
(parse.y)
Iwrouteoutthemeaningofthecode,soI’dlikeyoutoreadthembycomparingeachother.
case'!':movetoEXPR_BEGif(thenextcharacteris'='then){
tokenis「!=(tNEQ)」}if(thenextcharacteris'~'then){tokenis「!~(tNMATCH)」}ifitisneither,pushthereadcharacterbacktokenis'!'
Thiscaseclauseisshort,butdescribestheimportantruleofthescanner.Itis“thelongestmatchrule”.Thetwocharacters"!="canbeinterpretedintwoways:“!and=”or“!=”,butinthiscase"!="mustbeselected.Thelongestmatchisessentialforscannersofprogramminglanguages.
And,lex_stateisthevariablerepresentsthestateofthescanner.Thiswillbediscussedtoomuchinthenextchapter“Finite-StateScanner”,youcanignoreitfornow.EXPR_BEGindicates“itisclearlyatthebeginning”.Thisisbecausewhicheveritis!ofnotoritis!=oritis!~,itsnextsymbolisthebeginningofanexpression.
'<'
Next,we’lltrytolookat'<'asanexampleofusingyylval(thevalueofasymbol).
▼yylex−'>'
3296case'>':3297switch(lex_state){3298caseEXPR_FNAME:caseEXPR_DOT:3299lex_state=EXPR_ARG;break;3300default:
3301lex_state=EXPR_BEG;break;3302}3303if((c=nextc())=='='){3304returntGEQ;3305}3306if(c=='>'){3307if((c=nextc())=='='){3308yylval.id=tRSHFT;3309lex_state=EXPR_BEG;3310returntOP_ASGN;3311}3312pushback(c);3313returntRSHFT;3314}3315pushback(c);3316return'>';
(parse.y)
Theplacesexceptforyylvalcanbeignored.Concentratingonlyonepointwhenreadingaprogramisessential.
Atthispoint,forthesymboltOP_ASGNof>>=,itsetitsvaluetRSHIFT.Sincetheusedunionmemberisid,itstypeisID.tOP_ASGNisthesymbolofselfassignment,itrepresentsallofthethingslike+=and-=and*=.Inordertodistinguishthemlater,itpassesthetypeoftheselfassignmentasavalue.
Thereasonwhytheselfassignmentsarebundledis,itmakestheruleshorter.Bundlingthingsthatcanbebundledatthescannerasmuchaspossiblemakestherulemoreconcise.Then,whyarethebinaryarithmeticoperatorsnotbundled?Itisbecausetheydiffersintheirprecedences.
':'
Ifscanningiscompletelyindependentfromparsing,thistalkwouldbesimple.Butinreality,itisnotthatsimple.TheRubygrammarisparticularlycomplex,ithasasomewhatdifferentmeaningwhenthere’saspaceinfrontofit,thewaytosplittokensischangeddependingonthesituationaround.Thecodeof':'shownbelowisanexamplethataspacechangesthebehavior.
▼yylex−':'
3761case':':3762c=nextc();3763if(c==':'){3764if(lex_state==EXPR_BEG||lex_state==EXPR_MID||3765(IS_ARG()&&space_seen)){3766lex_state=EXPR_BEG;3767returntCOLON3;3768}3769lex_state=EXPR_DOT;3770returntCOLON2;3771}3772pushback(c);3773if(lex_state==EXPR_END||lex_state==EXPR_ENDARG||ISSPACE(c)){3774lex_state=EXPR_BEG;3775return':';3776}3777lex_state=EXPR_FNAME;3778returntSYMBEG;
(parse.y)
Again,ignoringthingsrelatingtolex_state,I’dlikeyoufocusonaroundspace_seen.
space_seenisthevariablethatbecomestruewhenthere’saspacebeforeatoken.Ifitismet,meaningthere’saspaceinfrontof'::',itbecomestCOLON3,ifthere’snot,itseemstobecometCOLON2.ThisisasIexplainedatprimaryintheprevioussection.
IdentifierUntilnow,sincetherewereonlysymbols,itwasjustacharacteror2characters.Thistime,we’lllookatalittlelongthings.Itisthescanningpatternofidentifiers.
First,theoutlineofyylexwasasfollows:
yylex(...){switch(c=nextc()){casexxxx:....casexxxx:....default:}
thescanningcodeofidentifiers}
Thenextcodeisanextractfromtheendofthehugeswitch.Thisisrelativelylong,soI’llshowitwithcomments.
▼yylex—identifiers
4081case'@':/*aninstancevariableoraclassvariable*/4082c=nextc();
4083newtok();4084tokadd('@');4085if(c=='@'){/*@@,meaningaclassvariable*/4086tokadd('@');4087c=nextc();4088}4089if(ISDIGIT(c)){/*@1andsuch*/4090if(tokidx==1){4091rb_compile_error("`@%c'isnotavalidinstancevariablename",c);4092}4093else{4094rb_compile_error("`@@%c'isnotavalidclassvariablename",c);4095}4096}4097if(!is_identchar(c)){/*astrangecharacterappearsnextto@*/4098pushback(c);4099return'@';4100}4101break;41024103default:4104if(!is_identchar(c)||ISDIGIT(c)){4105rb_compile_error("Invalidchar`\\%03o'inexpression",c);4106gotoretry;4107}41084109newtok();4110break;4111}41124113while(is_identchar(c)){/*betweencharactersthatcanbeusedasidentifieres*/4114tokadd(c);4115if(ismbchar(c)){/*ifitistheheadbyteofamulti-bytecharacter*/4116inti,len=mbclen(c)-1;41174118for(i=0;i<len;i++){4119c=nextc();4120tokadd(c);4121}4122}4123c=nextc();4124}4125if((c=='!'||c=='?')&&
is_identchar(tok()[0])&&!peek('=')){/*theendcharacterofname!orname?*/4126tokadd(c);4127}4128else{4129pushback(c);4130}4131tokfix();
(parse.y)
Finally,I’dlikeyoufocusontheconditionattheplacewhereadding!or?.Thispartistointerpretinthenextway.
obj.m=1#obj.m=1(notobj.m=)obj.m!=1#obj.m!=1(notobj.m!)
((errata:thiscodeisnotrelatingtothatcondition))
Thisis“not”longest-match.The“longest-match”isaprinciplebutnotaconstraint.Sometimes,youcanrefuseit.
ThereservedwordsAfterscanningtheidentifiers,thereareabout100linesofthecodefurthertodeterminetheactualsymbols.Inthepreviouscode,instancevariables,classvariablesandlocalvariables,theyarescannedallatonce,buttheyarecategorizedhere.
ThisisOKbut,insideitthere’salittlestrangepart.Itistheparttofilterthereservedwords.Sincethereservedwordsarenotdifferentfromlocalvariablesinitscharactertype,scanninginabundleand
categorizinglaterismoreefficient.
Then,assumethere’sstrthatisachar*string,howcanwedeterminewhetheritisareservedword?First,ofcourse,there’sawayofcomparingalotbyifstatementsandstrcmp().However,thisiscompletelynotsmart.Itisnotflexible.Itsspeedwillalsolinearlyincrease.Usually,onlythedatawouldbeseparatedtoalistorahashinordertokeepthecodeshort.
/*convertthecodetodata*/structentry{char*name;intsymbol;};structentry*table[]={{"if",kIF},{"unless",kUNLESS},{"while",kWHILE},/*……omission……*/};
{....returnlookup_symbol(table,tok());}
Then,howrubyisdoingisthat,itusesahashtable.Furthermore,itisaperfecthash.AsIsaidwhentalkingaboutst_table,ifyouknewthesetofthepossiblekeysbeforehand,sometimesyoucouldcreateahashfunctionthatneverconflicts.Asforthereservedwords,“thesetofthepossiblekeysisknownbeforehand”,soitislikelythatwecancreateaperfecthashfunction.
But,“beingabletocreate”andactuallycreatingaredifferent.Creatingmanuallyistoomuchcumbersome.Sincethereservedwordscanincreaseordecrease,thiskindofprocessmustbe
automated.
Therefore,gperfcomesin.gperfisoneofGNUproducts,itgeneratesaperfectfunctionfromasetofvalues.Inordertoknowtheusageofgperfitselfindetail,Irecommendtodomangperf.Here,I’llonlydescribehowtousethegeneratedresult.
Inrubytheinputfileforgperfiskeywordsandtheoutputislex.c.parse.ydirectly#includeit.Basically,doing#includeCfilesisnotgood,butperformingnon-essentialfileseparationforjustonefunctionisworse.Particularly,inruby,there'sthepossibilitythatextern+functionsareusedbyextensionlibrarieswithoutbeingnoticed,thusthefunctionthatdoesnotwanttokeepitscompatibilityshouldbestatic.
Then,inthelex.c,afunctionnamedrb_reserved_word()isdefined.Bycallingitwiththechar*ofareservedwordaskey,youcanlookup.ThereturnvalueisNULLifnotfound,structkwtable*iffound(inotherwords,iftheargumentisareservedword).Thedefinitionofstructkwtableisasfollows:
▼kwtable
1structkwtable{char*name;intid[2];enumlex_statestate;};
(keywords)
nameisthenameofthereservedword,id[0]isitssymbol,id[1]isitssymbolasamodification(kIF_MODandsuch).lex_stateis“the
lex_stateshouldbemovedtoafterreadingthisreservedword”.lex_statewillbeexplainedinthenextchapter.
Thisistheplacewhereactuallylookingup.
▼yylex()—identifier—callrb_reserved_word()
4173structkwtable*kw;41744175/*Seeifitisareservedword.*/4176kw=rb_reserved_word(tok(),toklen());4177if(kw){
(parse.y)
StringsThedoublequote(")partofyylex()isthis.
▼yylex−'"'
3318case'"':3319lex_strterm=NEW_STRTERM(str_dquote,'"',0);3320returntSTRING_BEG;
(parse.y)
Surprisinglyitfinishesafterscanningonlythefirstcharacter.Then,thistime,whentakingalookattherule,tSTRING_BEGisfoundinthefollowingpart:
▼rulesforstrings
string1:tSTRING_BEGstring_contentstSTRING_END
string_contents:|string_contentsstring_content
string_content:tSTRING_CONTENT|tSTRING_DVARstring_dvar|tSTRING_DBEGterm_pushcompstmt'}'
string_dvar:tGVAR|tIVAR|tCVAR|backref
term_push:
Theserulesarethepartintroducedtodealwithembeddedexpressionsinsideofstrings.tSTRING_CONTENTisliteralpart,tSTRING_DBEGis"#{".tSTRING_DVARrepresents“#thatinfrontofavariable”.Forexample,
".....#$gvar...."
thiskindofsyntax.Ihavenotexplainedbutwhentheembeddedexpressionisonlyavariable,{and}canbeleftout.Butthisisoftennotrecommended.DofDVAR,DBEGseemstheabbreviationofdynamic.
And,backrefrepresentsthespecialvariablesrelatingtoregularexpressions,suchas$1$2or$&$'.
term_pushis“aruledefinedforitsaction”.
Now,we’llgobacktoyylex()here.Ifitsimplyreturnstheparser,sinceitscontextisthe“interior”ofastring,itwouldbeaproblemifavariableandifandothersaresuddenlyscannedinthenextyylex().Whatplaysanimportantrolethereis…
case'"':lex_strterm=NEW_STRTERM(str_dquote,'"',0);returntSTRING_BEG;
…lex_strterm.Let’sgobacktothebeginningofyylex().
▼thebeginningofyylex()
3106staticint3107yylex()3108{3109staticIDlast_id=0;3110registerintc;3111intspace_seen=0;3112intcmd_state;31133114if(lex_strterm){/*scanningstring*/3131returntoken;3132}3133cmd_state=command_start;3134command_start=Qfalse;3135retry:3136switch(c=nextc()){
(parse.y)
Iflex_strtermexists,itentersthestringmodewithoutasking.Itmeans,converselyspeaking,ifthere’slex_strterm,itiswhilescanningstring,andwhenparsingtheembeddedexpressions
insidestrings,youhavetosetlex_strtermto0.And,whentheembeddedexpressionends,youhavetosetitback.Thisisdoneinthefollowingpart:
▼string_content
1916string_content:....1917|tSTRING_DBEGterm_push1918{1919$<num>1=lex_strnest;1920$<node>$=lex_strterm;1921lex_strterm=0;1922lex_state=EXPR_BEG;1923}1924compstmt'}'1925{1926lex_strnest=$<num>1;1927quoted_term=$2;1928lex_strterm=$<node>3;1929if(($$=$4)&&nd_type($$)==NODE_NEWLINE){1930$$=$$->nd_next;1931rb_gc_force_recycle((VALUE)$4);1932}1933$$=NEW_EVSTR($$);1934}
(parse.y)
Intheembeddedaction,lex_streamissavedasthevalueoftSTRING_DBEG(virtually,thisisastackpush),itrecoversintheordinaryaction(pop).Thisisafairlysmartway.
Butwhyisitdoingthistediousthing?Can’titbedoneby,afterscanningnormally,callingyyparse()recursivelyatthepointwhenitfinds#{?There’sactuallyaproblem.yyparse()can’tbecalled
recursively.Thisisthewellknownlimitofyacc.Sincetheyyvalthatisusedtoreceiveorpassavalueisaglobalvariable,carelessrecursivecallscandestroythevalue.Withbison(yaccofGNU),recursivecallsarepossiblebyusing%pure_parserdirective,butthecurrentrubydecidednottoassumebison.Inreality,byacc(Berkelyyacc)isoftenusedinBSD-derivedOSandWindowsandsuch,ifbisonisassumed,itcausesalittlecumbersome.
lex_strterm
Aswe’veseen,whenyouconsiderlex_streamasabooleanvalue,itrepresentswhetherornotthescannerisinthestringmode.Butitscontentsalsohasameaning.First,let’slookatitstype.
▼lex_strterm
72staticNODE*lex_strterm;
(parse.y)
ThisdefinitionshowsitstypeisNODE*.ThisisthetypeusedforsyntaxtreeandwillbediscussedindetailinChapter12:Syntaxtreeconstruction.Forthetimebeing,itisastructurewhichhasthreeelements,sinceitisVALUEyoudon’thavetofree()it,youshouldrememberonlythesetwopoints.
▼NEW_STRTERM()
2865#defineNEW_STRTERM(func,term,paren)\2866rb_node_newnode(NODE_STRTERM,(func),(term),(paren))
(parse.y)
Thisisamacrotocreateanodetobestoredinlex_stream.First,termistheterminalcharacterofthestring.Forexample,ifitisa"string,itis",andifitisa'string,itis'.
parenisusedtostorethecorrespondingparenthesiswhenitisa%string.Forexample,
%Q(..........)
inthiscase,parenstores'('.And,termstorestheclosingparenthesis')'.Ifitisnota%string,parenis0.
Atlast,func,thisindicatesthetypeofastring.Theavailabletypesaredecidedasfollows:
▼func
2775#defineSTR_FUNC_ESCAPE0x01/*backslashnotationssuchas\nareineffect*/2776#defineSTR_FUNC_EXPAND0x02/*embeddedexpressionsareineffect*/2777#defineSTR_FUNC_REGEXP0x04/*itisaregularexpression*/2778#defineSTR_FUNC_QWORDS0x08/*%w(....)or%W(....)*/2779#defineSTR_FUNC_INDENT0x20/*<<-EOS(thefinishingsymbolcanbeindented)*/27802781enumstring_type{2782str_squote=(0),2783str_dquote=(STR_FUNC_EXPAND),2784str_xquote=(STR_FUNC_ESCAPE|STR_FUNC_EXPAND),2785str_regexp=(STR_FUNC_REGEXP|STR_FUNC_ESCAPE|STR_FUNC_EXPAND),2786str_sword=(STR_FUNC_QWORDS),2787str_dword=(STR_FUNC_QWORDS|STR_FUNC_EXPAND),2788};
(parse.y)
Eachmeaningofenumstring_typeisasfollows:
str_squote 'string/%qstr_dquote "string/%Qstr_xquote commandstring(notbeexplainedinthisbook)str_regexp regularexpressionstr_sword %wstr_dword %W
StringscanfunctionTherestisreadingyylex()inthestringmode,inotherwords,theifatthebeginning.
▼yylex−string
3114if(lex_strterm){3115inttoken;3116if(nd_type(lex_strterm)==NODE_HEREDOC){3117token=here_document(lex_strterm);3118if(token==tSTRING_END){3119lex_strterm=0;3120lex_state=EXPR_END;3121}3122}3123else{3124token=parse_string(lex_strterm);3125if(token==tSTRING_END||token==tREGEXP_END){3126rb_gc_force_recycle((VALUE)lex_strterm);3127lex_strterm=0;3128lex_state=EXPR_END;3129}
3130}3131returntoken;3132}
(parse.y)
Itisdividedintothetwomajorgroups:heredocumentandothers.Butthistime,wewon’treadparse_string().AsIpreviouslydescribed,therearealotofconditions,itistremendouslybeingaspaghetticode.IfItriedtoexplainit,oddsarehighthatreaderswouldcomplainthat“itisasthecodeiswritten!”.Furthermore,althoughitrequiresalotofefforts,itisnotinteresting.
But,notexplainingatallisalsonotagoodthingtodo,ThemodifiedversionthatfunctionsareseparatelydefinedforeachtargettobescannediscontainedintheattachedCD-ROM(doc/parse_string.html).I’dlikereaderswhoareinterestedintotrytolookoverit.
HereDocumentIncomparisontotheordinarystrings,heredocumentsarefairlyinteresting.Thatmaybebecause,unliketheotherelements,itdealwithalineatatime.Moreover,itisterrificthatthestartingsymbolcanexistinthemiddleofaprogram.First,I’llshowthecodeofyylex()toscanthestartingsymbolofaheredocument.
▼yylex−'<'
3260case'<':
3261c=nextc();3262if(c=='<'&&3263lex_state!=EXPR_END&&3264lex_state!=EXPR_DOT&&3265lex_state!=EXPR_ENDARG&&3266lex_state!=EXPR_CLASS&&3267(!IS_ARG()||space_seen)){3268inttoken=heredoc_identifier();3269if(token)returntoken;
(parse.y)
Asusual,we’llignoretheherdoflex_state.Then,wecanseethatitreadsonly“<<”hereandtherestisscannedatheredoc_identifier().Therefore,hereisheredoc_identifier().
▼heredoc_identifier()
2926staticint2927heredoc_identifier()2928{/*...omission...readingthestartingsymbol*/2979tokfix();2980len=lex_p-lex_pbeg;/*(A)*/2981lex_p=lex_pend;/*(B)*/2982lex_strterm=rb_node_newnode(NODE_HEREDOC,2983rb_str_new(tok(),toklen()),/*nd_lit*/2984len,/*nd_nth*/2985/*(C)*/lex_lastline);/*nd_orig*/29862987returnterm=='`'?tXSTRING_BEG:tSTRING_BEG;2988}
(parse.y)
Thepartwhichreadsthestartingsymbol(<<EOS)isnotimportant,soitistotallyleftout.Untilnow,theinputbufferprobablyhas
becomeasdepictedasFigure10.Let’srecallthattheinputbufferreadsalineatatime.
Figure10:scanning"printf\(<<EOS,n\)"
Whatheredoc_identifier()isdoingisasfollows:(A)lenisthenumberofreadbytesinthecurrentline.(B)and,suddenlymovelex_ptotheendoftheline.Itmeansthatinthereadline,thepartafterthestartingsymbolisreadbutnotparsed.Whenisthatrestpartparsed?Forthismystery,ahintisthatat(C)thelex_lastline(thecurrentlyreadline)andlen(thelengththathasalreadyread)aresaved.
Then,thedynamiccallgraphbeforeandafterheredoc_identifierissimplyshownbelow:
yyparseyylex(case'<')heredoc_identifier(lex_strterm=....)yylex(thebeginningif)here_document
And,thishere_document()isdoingthescanofthebodyoftheheredocument.Omittinginvalidcasesandaddingsomecomments,heredoc_identifier()isshownbelow.Noticethatlex_strterm
remainsunchangedafteritwassetatheredoc_identifier().
▼here_document()(simplified)
here_document(NODE*here){VALUEline;/*thelinecurrentlybeingscanned*/VALUEstr=rb_str_new("",0);/*astringtostoretheresults*/
/*...handlinginvalidconditions,omitted...*/
if(embededexpressionsnotineffect){do{line=lex_lastline;/*(A)*/rb_str_cat(str,RSTRING(line)->ptr,RSTRING(line)->len);lex_p=lex_pend;/*(B)*/if(nextc()==-1){/*(C)*/gotoerror;}}while(thecurrentlyreadlineisnotequaltothefinishingsymbol);}else{/*theembededexpressionsareavailable...omitted*/}heredoc_restore(lex_strterm);lex_strterm=NEW_STRTERM(-1,0,0);yylval.node=NEW_STR(str);returntSTRING_CONTENT;}
rb_str_cat()isthefunctiontoconnectachar*attheendofaRubystring.Itmeansthatthecurrentlybeingreadlinelex_lastlineisconnectedtostrat(A).Afteritisconnected,there’snouseofthecurrentline.At(B),suddenlymovinglex_ptotheendofline.And(C)isaproblem,inthisplace,itlookslikedoingthecheckwhetheritisfinished,butactuallythenext“line”isread.I’dlikeyouto
recallthatnextc()automaticallyreadsthenextlinewhenthecurrentlinehasfinishedtoberead.So,sincethecurrentlineisforciblyfinishedat(B),lex_pmovestothenextlineat(C).
Andfinally,leavingthedo~whileloop,itisheredoc_restore().
▼heredoc_restore()
2990staticvoid2991heredoc_restore(here)2992NODE*here;2993{2994VALUEline=here->nd_orig;2995lex_lastline=line;2996lex_pbeg=RSTRING(line)->ptr;2997lex_pend=lex_pbeg+RSTRING(line)->len;2998lex_p=lex_pbeg+here->nd_nth;2999heredoc_end=ruby_sourceline;3000ruby_sourceline=nd_line(here);3001rb_gc_force_recycle(here->nd_lit);3002rb_gc_force_recycle((VALUE)here);3003}
(parse.y)
here->nd_origholdsthelinewhichcontainsthestartingsymbol.here->nd_nthholdsthelengthalreadyreadinthelinecontainsthestartingsymbol.Itmeansitcancontinuetoscanfromthejustafterthestartingsymbolasiftherewasnothinghappened.(Figure11)
Figure11:ThepictureofassignationofscanningHereDocument
TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera
CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License
RubyHackingGuide
TranslatedbyPeterZotovI’mverygratefultomyemployerEvilMartians,whosponsoredthework,andNikolayKonovalenko,whoputmoreeffortinthistranslationthanIcouldeverwishfor.Withoutthem,IwouldbestillfiguringoutwhatCOND_LEXPOP()actuallydoes.
Chapter11Finite-state
scanner
Outline
Intheory,thescannerandtheparserarecompletelyindependentofeachother–thescannerissupposedtorecognizetokens,whiletheparserissupposedtoprocesstheresultingseriesoftokens.Itwouldbeniceifthingswerethatsimple,butinrealityitrarelyis.Dependingonthecontextoftheprogramitisoftennecessarytoalterthewaytokensarerecognizedortheirsymbols.Inthischapterwewilltakealookatthewaythescannerandtheparsercooperate.
PracticalexamplesInmostprogramminglanguages,spacesdon’thaveanyspecificmeaningunlesstheyareusedtoseparatewords.However,Rubyisnotanordinarylanguageandmeaningscanchangesignificantlydependingonthepresenceofspaces.Hereisanexample
a[i]=1#a[i]=(1)
a[i]#a([i])
Theformerisanexampleofassigninganindex.Thelatterisanexampleofomittingthemethodcallparenthesesandpassingamemberofanarraytoaparameter.
Hereisanotherexample.
a+1#(a)+(1)a+1#a(+1)
Thisseemstobereallydislikedbysome.
However,theaboveexamplesmightgiveonetheimpressionthatonlyomittingthemethodcallparenthesescanbeasourceoftrouble.Let’slookatadifferentexample.
`cvsdiffparse.y`#commandcallstringobj.`("cvsdiffparse.y")#normalmethodcall
Here,theformerisamethodcallusingaliteral.Incontrast,thelatterisanormalmethodcall(with‘’’beingthemethodname).Dependingonthecontext,theycouldbehandledquitedifferently.
Belowisanotherexamplewherethefunctioningchangesdramatically
print(<<EOS)#here-document......EOS
list=[]
list<<nil#list.push(nil)
Theformerisamethodcallusingahere-document.Thelatterisamethodcallusinganoperator.
Asdemonstrated,Ruby’sgrammarcontainsmanypartswhicharedifficulttoimplementinpractice.Icouldn’trealisticallygiveathoroughdescriptionofallinjustonechapter,sointhisoneIwilllookatthebasicprinciplesandthosepartswhichpresentthemostdifficulty.
lex_state
Thereisavariablecalled“lex_state”.“lex”,obviously,standsfor“lexer”.Thus,itisavariablewhichshowsthescanner’sstate.
Whatstatesarethere?Let’slookatthedefinitions.
▼enumlex_state
61staticenumlex_state{62EXPR_BEG,/*ignorenewline,+/-isasign.*/63EXPR_END,/*newlinesignificant,+/-isaoperator.*/64EXPR_ARG,/*newlinesignificant,+/-isaoperator.*/65EXPR_CMDARG,/*newlinesignificant,+/-isaoperator.*/66EXPR_ENDARG,/*newlinesignificant,+/-isaoperator.*/67EXPR_MID,/*newlinesignificant,+/-isaoperator.*/68EXPR_FNAME,/*ignorenewline,noreservedwords.*/69EXPR_DOT,/*rightafter`.'or`::',noreservedwords.*/70EXPR_CLASS,/*immediateafter`class',noheredocument.*/71}lex_state;
(parse.y)
TheEXPRprefixstandsfor“expression”.EXPR_BEGis“Beginningofexpression”andEXPR_DOTis“insidetheexpression,afterthedot”.
Toelaborate,EXPR_BEGdenotes“Locatedattheheadoftheexpression”.EXPR_ENDdenotes“Locatedattheendoftheexpression”.EXPR_ARGdenotes“Beforethemethodparameter”.EXPR_FNAMEdenotes“Beforethemethodname(suchasdef)”.Theonesnotcoveredherewillbeanalyzedindetailbelow.
Incidentally,Iamledtobelievethatlex_stateactuallydenotes“afterparentheses”,“headofstatement”,soitshowsthestateoftheparserratherthanthescanner.However,it’sstillconventionallyreferredtoasthescanner’sstateandhere’swhy.
Themeaningof“state”hereisactuallysubtlydifferentfromhowit’susuallyunderstood.The“state”oflex_stateis“astateunderwhichthescannerdoesx”.ForexampleanaccuratedescriptionofEXPR_BEGwouldbe“Astateunderwhichthescanner,ifrun,willreactasifthisisattheheadoftheexpression”
Technically,this“state”canbedescribedasthestateofthescannerifwelookatthescannerasastatemachine.However,delvingtherewouldbeveeringofftopicandtootedious.Iwouldreferanyinterestedreaderstoanytextbookondatastructures.
Understandingthefinite-statescannerThetricktoreadingafinite-statescanneristonottrytograsp
everythingatonce.Someonewritingaparserwouldprefernottouseafinite-statescanner.Thatistosay,theywouldprefernottomakeitthemainpartoftheprocess.Scannerstatemanagementoftenendsupbeinganextrapartattachedtothemainpart.Inotherwords,thereisnosuchthingasacleanandconcisediagramforstatetransitions.
Whatoneshoulddoisthinktowardspecificgoals:“Thispartisneededtosolvethistask”“Thiscodeisforovercomingthisproblem”.Basically,putoutcodeinaccordancewiththetaskathand.Ifyoustartthinkingaboutthemutualrelationshipbetweentasks,you’llinvariablyendupstuck.LikeIsaid,thereissimplynosuchthing.
However,therestillneedstobeanoverreachingobjective.Whenreadingafinite-statescanner,thatobjectivewouldundoubtedlybetounderstandeverystate.Forexample,whatkindofstateisEXPR_BEG?Itisastatewheretheparserisattheheadoftheexpression.
ThestaticapproachSo,howcanweunderstandwhatastatedoes?Therearethreebasicapproaches
Lookatthenameofthestate
Thesimplestandmostobviousapproach.Forexample,thenameEXPR_BEGobviouslyreferstothehead(beginning)ofsomething.
Observewhatchangesunderthisstate
Lookatthewaytokenrecognitionchangesunderthestate,thentestitincomparisontopreviousexamples.
Lookatthestatefromwhichittransitions
Lookatwhichstateittransitionsfromandwhichtokencausesit.Forexample,if'\n'isalwaysfollowedbyatransitiontoaHEADstate,itmustdenotetheheadoftheline.
LetustakeEXPR_BEGasanexample.InRuby,allstatetransitionsareexpressedasassignmentstolex_state,sofirstweneedtogrepEXPR_BEGassignmentstofindthem.Thenweneedtoexporttheirlocation,forexample,suchas'#'and'*'and'!'ofyylex()Thenweneedtorecallthestatepriortothetransitionandconsiderwhichcasesuitsbest(seeimage1)
Figure1:TransitiontoEXPR_BEG
((errata:1.ActuallywhenthestateisEXPR_DOT,thestateafterreadingatIDENTIFIERwouldbeeitherARGorCMDARG.However,becausetheauthorwantedtoroughlygroupthemasFNAME/DOTandtheothershere,thesetwoareshowntogether.Therefore,tobeprecise,
EXPR_FNAMEandEXPR_DOTshouldhavealsobeenseparated.2.‘)’doesnotcausethetransitionfrom“everythingelse”toEXPR_BEG.))
Thisdoesindeedlookliketheheadofstatement.Especiallythe'\n'andthe';'Theopenparenthesesandthecommaalsosuggestthatit’stheheadnotjustofthestatement,butoftheexpressionaswell.
ThedynamicapproachThereareothereasymethodstoobservethefunctioning.Forexample,youcanuseadebuggerto“hook”theyylex()andlookatthelex_state
Anotherwayistorewritethesourcecodetooutputstatetransitions.Inthecaseoflex_stateweonlyhaveafewpatternsforassignmentandcomparison,sothesolutionwouldbetograspthemastextpatternsandrewritethecodetooutputstatetransitions.TheCDthatcomeswiththisbookcontainstherubylex-analysertool.Whennecessary,Iwillrefertoitinthistext.
Theoverallprocesslookslikethis:useadebuggerortheaforementionedtooltoobservethefunctioningoftheprogram.Thenlookatthesourcecodetoconfirmtheacquireddataanduseit.
Descriptionofstates
HereIwillgivesimpledescriptionsoflex_statestates.
EXPR_BEG
Headofexpression.Comesimmediatelyafter\n({[!?:,ortheoperatorop=Themostgeneralstate.
EXPR_MID
Comesimmediatelyafterthereservedwordsreturnbreaknextrescue.Invalidatesbinaryoperatorssuchas*or&GenerallysimilarinfunctiontoEXPR_BEG
EXPR_ARG
Comesimmediatelyafterelementswhicharelikelytobethemethodnameinamethodcall.Alsocomesimmediatelyafter'['ExceptforcaseswhereEXPR_CMDARGisused.
EXPR_CMDARG
Comesbeforethefirstparameterofanormalmethodcall.Formoreinformation,seethesection“Thedoconflict”
EXPR_END
Usedwhenthereisapossibilitythatthestatementisterminal.Forexample,afteraliteraloraclosingparenthesis.ExceptforcaseswhenEXPR_ENDARGisused
EXPR_ENDARG
SpecialiterationofEXPR_ENDComesimmediatelyaftertheclosingparenthesiscorrespondingtotLPAREN_ARGRefertothesection“Firstparameterenclosedinparentheses”
EXPR_FNAME
Comesbeforethemethodname,usuallyafterdef,alias,undeforthesymbol':'Asingle“`”canbeaname.
EXPR_DOT
Comesafterthedotinamethodcall.HandledsimilarlytoEXPR_FNAMEVariousreservedwordsaretreatedassimpleidentifiers.Asingle'`'canbeaname.
EXPR_CLASS
ComesafterthereservedwordclassThisisaverylimitedstate.
Thefollowingstatescanbegroupedtogether
BEGMID
ENDENDARG
ARGCMDARG
FNAMEDOT
Theyallexpresssimilarconditions.EXPR_CLASSisalittledifferent,
butonlyappearsinalimitednumberofplaces,notwarrantinganyspecialattention.
Line-breakhandling
TheproblemInRuby,astatementdoesnotnecessarilyrequireaterminator.InCorJavaastatementmustalwaysendwithasemicolon,butRubyhasnosuchrequirement.Statementsusuallytakeuponlyoneline,andthusendattheendoftheline.
Ontheotherhand,whenastatementisclearlycontinued,thishappensautomatically.Someconditionsfor“Thisstatementisclearlycontinued”areasfollows:
AfteracommaAfteraninfixoperatorParenthesesorbracketsarenotbalancedImmediatelyafterthereservedwordif
Etc.
ImplementationSo,whatdoweneedtoimplementthisgrammar?Simplyhaving
thescannerignoreline-breaksisnotsufficient.InagrammarlikeRuby’s,wherestatementsaredelimitedbyreservedwordsonbothends,conflictsdon’thappenasfrequentlyasinClanguages,butwhenItriedasimpleexperiment,Icouldn’tgetittoworkuntilIgotridofreturnnextbreakandreturnedthemethodcallparentheseswherevertheywereomitted.Toretainthosefeaturesweneedsomekindofterminalsymbolforstatements’ends.Itdoesn’tmatterwhetherit’s\nor';'butitisnecessary.
Twosolutionsexist–parser-basedandscanner-based.Fortheformer,youcanjustoptionallyput\nineveryplacethatallowsit.Forthelatter,havethe\npassedtotheparseronlywhenithassomemeaning(ignoringitotherwise).
Whichsolutiontouseisuptoyourpreferences,butusuallythescanner-basedoneisused.Thatwayproducesamorecompactcode.Moreover,iftherulesareoverloadedwithmeaninglesssymbols,itdefeatsthepurposeoftheparser-generator.
Tosumup,inRuby,line-breaksarebesthandledusingthescanner.Whenalineneedstocontinued,the\nwillbeignored,andwhenitneedstobeterminated,the\nispassedasatoken.Intheyylex()thisisfoundhere:
▼yylex()-'\n'
3155case'\n':3156switch(lex_state){3157caseEXPR_BEG:3158caseEXPR_FNAME:
3159caseEXPR_DOT:3160caseEXPR_CLASS:3161gotoretry;3162default:3163break;3164}3165command_start=Qtrue;3166lex_state=EXPR_BEG;3167return'\n';
(parse.y)
WithEXPR_BEG,EXPR_FNAME,EXPR_DOT,EXPR_CLASSitwillbegotoretry.Thatistosay,it’smeaninglessandshallbeignored.Thelabelretryisfoundinfrontofthelargeswitchintheyylex()
Inallotherinstances,line-breaksaremeaningfulandshallbepassedtotheparser,afterwhichlex_stateisrestoredtoEXPR_BEGBasically,wheneveraline-breakismeaningful,itwillbetheendofexpr
Irecommendleavingcommand_startaloneforthetimebeing.Toreiterate,tryingtograsptoomanythingsatoncewillonlyendinneedlessconfusion.
Letusnowtakealookatsomeexamplesusingtherubylex-analysertool.
%rubylex-analyser-e'm(a,b,c)unlessi'+EXPR_BEGEXPR_BEGC"\nm"tIDENTIFIEREXPR_CMDARG
EXPR_CMDARG"("'('EXPR_BEG0:condpush0:cmdpushEXPR_BEGC"a"tIDENTIFIEREXPR_CMDARGEXPR_CMDARG","','EXPR_BEGEXPR_BEGS"\nb"tIDENTIFIEREXPR_ARGEXPR_ARG","','EXPR_BEGEXPR_BEGS"c"tIDENTIFIEREXPR_ARGEXPR_ARG")"')'EXPR_END0:condlexpop0:cmdlexpopEXPR_ENDS"unless"kUNLESS_MODEXPR_BEGEXPR_BEGS"i"tIDENTIFIEREXPR_ARGEXPR_ARG"\n"\nEXPR_BEGEXPR_BEGC"\n"'EXPR_BEG
Asyoucansee,thereisalotofoutputhere,butweonlyneedtheleftandmiddlecolumns.Theleftcolumndisplaysthelex_statebeforeitenterstheyylex()whilethemiddlecolumndisplaysthetokensandtheirsymbols.
Thefirsttokenmandthesecondparameterbareprecededbyaline-breakbuta\nisappendedinfrontofthemanditisnottreatedasaterminalsymbol.Thatisbecausethelex_stateisEXPR_BEG.
However,inthesecondtolastline\nisusedasaterminalsymbol.ThatisbecausethestateisEXPR_ARG
Andthatishowitshouldbeused.Letushaveanotherexample.
%rubylex-analyser-e'classC<Objectend'+EXPR_BEGEXPR_BEGC"class"kCLASSEXPR_CLASS
EXPR_CLASS"\nC"tCONSTANTEXPR_ENDEXPR_ENDS"<"'<'EXPR_BEG+EXPR_BEGEXPR_BEGS"Object"tCONSTANTEXPR_ARGEXPR_ARG"\n"\nEXPR_BEGEXPR_BEGC"end"kENDEXPR_ENDEXPR_END"\n"\nEXPR_BEG
ThereservedwordclassisfollowedbyEXPR_CLASSsotheline-breakisignored.However,thesuperclassObjectisfollowedbyEXPR_ARG,sothe\nappears.
%rubylex-analyser-e'obj.class'+EXPR_BEGEXPR_BEGC"obj"tIDENTIFIEREXPR_CMDARGEXPR_CMDARG"."'.'EXPR_DOTEXPR_DOT"\nclass"tIDENTIFIEREXPR_ARGEXPR_ARG"\n"\nEXPR_BEG
'.'isfollowedbyEXPR_DOTsothe\nisignored.
NotethatclassbecomestIDENTIFIERdespitebeingareservedword.Thisisdiscussedinthenextsection.
Reservedwordsandidenticalmethodnames
Theproblem
InRuby,reservedwordscanusedasmethodnames.However,inactualityit’snotassimpleas“itcanbeused”–thereexistthreepossiblecontexts:
Methoddefinition(defxxxx)Call(obj.xxxx)Symbolliteral(:xxxx)
AllthreearepossibleinRuby.Belowwewilltakeacloserlookateach.
First,themethoddefinition.Itisprecededbythereservedworddefsoitshouldwork.
Incaseofthemethodcall,omittingthereceivercanbeasourceofdifficulty.However,thescopeofusehereisevenmorelimited,andomittingthereceiverisactuallyforbidden.Thatis,whenthemethodnameisareservedword,thereceiverabsolutelycannotbeomitted.Perhapsitwouldbemoreaccuratetosaythatitisforbiddeninordertoguaranteethatparsingisalwayspossible.
Finally,incaseofthesymbol,itisprecededbytheterminalsymbol':'soitalsoshouldwork.However,regardlessofreservedwords,the':'hereconflictswiththecolonina?b:cIfthisisavoided,thereshouldbenofurthertrouble.
Foreachofthesecases,similarlytobefore,ascanner-basedsolutionandaparser-basedsolutionexist.FortheformerusetIDENTIFIER(forexample)asthereservedwordthatcomesafterdef
or.or:Forthelatter,makethatintoarule.Rubyallowsforbothsolutionstobeusedineachofthethreecases.
MethoddefinitionThenamepartofthemethoddefinition.Thisishandledbytheparser.
▼Methoddefinitionrule
|kDEFfnamef_arglistbodystmtkEND|kDEFsingletondot_or_colonfnamef_arglistbodystmtkEND
Thereexistonlytworulesformethoddefinition–onefornormalmethodsandoneforsingletonmethods.Forboth,thenamepartisfnameanditisdefinedasfollows.
▼fname
fname:tIDENTIFIER|tCONSTANT|tFID|op|reswords
reswordsisareservedwordandopisabinaryoperator.Bothrulesconsistofsimplyallterminalsymbolslinedup,soIwon’tgointo
detailhere.Finally,fortFIDtheendcontainssymbolssimilarlytogsub!andinclude?
MethodcallMethodcallswithnamesidenticaltoreservedwordsarehandledbythescanner.Thescancodeforreservedwordsisshownbelow.
Scanningtheidentifierresult=(tIDENTIFIERortCONSTANT)
if(lex_state!=EXPR_DOT){structkwtable*kw;
/*Seeifitisareservedword.*/kw=rb_reserved_word(tok(),toklen());Reservedwordisprocessed}
EXPR_DOTexpresseswhatcomesafterthemethodcalldot.UnderEXPR_DOTreservedwordsareuniversallynotprocessed.ThesymbolforreservedwordsafterthedotbecomeseithertIDENTIFIERortCONSTANT.
SymbolsReservedwordsymbolsarehandledbyboththescannerandtheparser.First,therule.
▼symbol
symbol:tSYMBEGsym
sym:fname|tIVAR|tGVAR|tCVAR
fname:tIDENTIFIER|tCONSTANT|tFID|op|reswords
Reservedwords(reswords)areexplicitlypassedthroughtheparser.ThisisonlypossiblebecausethespecialterminalsymboltSYMBEGispresentatthestart.Ifthesymbolwere,forexample,':'itwouldconflictwiththeconditionaloperator(a?b:c)andstall.Thus,thetrickistorecognizetSYMBEGonthescannerlevel.
Buthowtocausethatrecognition?Let’slookattheimplementationofthescanner.
▼yylex-':'
3761case':':3762c=nextc();3763if(c==':'){3764if(lex_state==EXPR_BEG||lex_state==EXPR_MID||3765(IS_ARG()&&space_seen)){3766lex_state=EXPR_BEG;3767returntCOLON3;3768}3769lex_state=EXPR_DOT;3770returntCOLON2;3771}3772pushback(c);
3773if(lex_state==EXPR_END||lex_state==EXPR_ENDARG||ISSPACE(c)){3774lex_state=EXPR_BEG;3775return':';3776}3777lex_state=EXPR_FNAME;3778returntSYMBEG;
(parse.y)
Thisisasituationwhentheifinthefirsthalfhastwoconsecutive':'Inthissituation,the'::'isscannedinaccordancewiththeleftmostlongestmatchbasicrule.
Forthenextif,the':'istheaforementionedconditionaloperator.BothEXPR_ENDandEXPR_ENDARGcomeattheendoftheexpression,soaparameterdoesnotappear.Thatistosay,sincetherecan’tbeasymbol,the':'isaconditionaloperator.Similarly,ifthenextletterisaspace(ISSPACE(c)),asymbolisunlikelysoitisagainaconditionaloperator.
Whennoneoftheaboveapplies,it’sallsymbols.Inthatcase,atransitiontoEXPR_FNAMEoccurstoprepareforallmethodnames.Thereisnoparticulardangertoparsinghere,butifthisisforgotten,thescannerwillnotpassvaluestoreservedwordsandvaluecalculationwillbedisrupted.
Modifiers
TheproblemForexample,forififthereexistsanormalnotationandoneforpostfixmodification.
#Normalnotationifcondthenexprend
#Postfixexprifcond
Thiscouldcauseaconflict.Thereasoncanbeguessed–again,it’sbecausemethodparentheseshavebeenomittedpreviously.Observethisexample
callifcondthenaelsebend
Readingthisexpressionuptotheifgivesustwopossibleinterpretations.
call((if....))call()if....
Whenunsure,Irecommendsimplyusingtrialanderrorandseeingifaconflictoccurs.LetustrytohandleitwithyaccafterchangingkIF_MODtokIFinthegrammar.
%yaccparse.yparse.ycontains4shift/reduceconflictsand13reduce/reduceconflicts.
Asexpected,conflictsareaplenty.Ifyouareinterested,youaddtheoption-vtoyaccandbuildalog.Thenatureoftheconflictsshouldbeshownthereingreatdetail.
ImplementationSo,whatistheretodo?InRuby,onthesymbollevel(thatis,onthescannerlevel)thenormalifisdistinguishedfromthepostfixifbythembeingkIFandkIF_MODrespectively.Thisalsoappliestoallotherpostfixoperators.Inall,therearefive–kUNLESS_MODkUNTIL_MODkWHILE_MODkRESCUE_MODandkIF_MODThedistinctionismadehere:
▼yylex-Reservedword
4173structkwtable*kw;41744175/*Seeifitisareservedword.*/4176kw=rb_reserved_word(tok(),toklen());4177if(kw){4178enumlex_statestate=lex_state;4179lex_state=kw->state;4180if(state==EXPR_FNAME){4181yylval.id=rb_intern(kw->name);4182}4183if(kw->id[0]==kDO){4184if(COND_P())returnkDO_COND;4185if(CMDARG_P()&&state!=EXPR_CMDARG)4186returnkDO_BLOCK;4187if(state==EXPR_ENDARG)4188returnkDO_BLOCK;4189returnkDO;4190}4191if(state==EXPR_BEG)/***Here***/4192returnkw->id[0];
4193else{4194if(kw->id[0]!=kw->id[1])4195lex_state=EXPR_BEG;4196returnkw->id[1];4197}4198}
(parse.y)
Thisislocatedattheendofyylexaftertheidentifiersarescanned.Thepartthathandlesmodifiersisthelast(innermost)if〜else
WhetherthereturnvalueisalteredcanbedeterminedbywhetherornotthestateisEXPR_BEG.Thisiswhereamodifierisidentified.Basically,thevariablekwisthekeyandifyoulookfaraboveyouwillfindthatitisstructkwtable
I’vealreadydescribedinthepreviouschapterhowstructkwtableisastructuredefinedinkeywordsandthehashfunctionrb_reserved_word()iscreatedbygperf.I’llshowthestructurehereagain.
▼keywords–structkwtable
1structkwtable{char*name;intid[2];enumlex_statestate;};
(keywords)
I’vealreadyexplainedaboutnameandid[0]–theyarethereservedwordnameanditssymbol.HereIwillspeakabouttheremainingmembers.
First,id[1]isasymboltodealwithmodifiers.Forexample,incase
ofifthatwouldbekIF_MOD.Whenareservedworddoesnothaveamodifierequivalent,id[0]andid[1]containthesamethings.
Becausestateisenumlex_stateitisthestatetowhichatransitionshouldoccurafterthereservedwordisread.Belowisalistcreatedinthekwstat.rbtoolwhichImade.ThetoolcanbefoundontheCD.
%kwstat.rbruby/keywords----EXPR_ARGdefined?superyield
----EXPR_BEGandcaseelseensureifmoduleorunlesswhenbegindoelsifforinnotthenuntilwhile
----EXPR_CLASSclass
----EXPR_ENDBEGIN__FILE__endnilretrytrueEND__LINE__falseredoself
----EXPR_FNAMEaliasdefundef
----EXPR_MIDbreaknextrescuereturn
----modifiersifrescueunlessuntilwhile
Thedoconflict
TheproblemTherearetwoiteratorforms–do〜endand{〜}Theirdifferenceisinpriority–{〜}hasamuchhigherpriority.Ahigherprioritymeansthataspartofthegrammaraunitis“small”whichmeansitcanbeputintoasmallerrule.Forexample,itcanbeputnotintostmtbutexprorprimary.Inthepast{〜}iteratorswereinprimarywhiledo〜enditeratorswereinstmt
Bytheway,therehasbeenarequestforanexpressionlikethis:
mdo....end+mdo....end
Toallowforthis,putthedo〜enditeratorinargorprimary.Incidentally,theconditionforwhileisexpr,meaningitcontainsargandprimary,sothedowillcauseaconflicthere.Basically,itlookslikethis:
whilemdo....end
Atfirstglance,thedolookslikethedoofwhile.However,acloserlookrevealsthatitcouldbeamdo〜endbundling.Somethingthat’snotobviouseventoapersonwilldefinitelycauseyacctoconflict.Let’stryitinpractice.
/*doconflictexperiment*/%tokenkWHILEkDOtIDENTIFIERkEND%%
expr:kWHILEexprkDOexprkEND|tIDENTIFIER|tIDENTIFIERkDOexprkEND
Isimplifiedtheexampletoonlyincludewhile,variablereferencinganditerators.Thisrulecausesashift/reduceconflictiftheheadoftheconditionalcontainstIDENTIFIER.IftIDENTIFIERisusedforvariablereferencinganddoisappendedtowhile,thenit’sreduction.Ifit’smadeaniteratordo,thenit’sashift.
Unfortunately,inashift/reduceconflicttheshiftisprioritized,soifleftunchecked,dowillbecomeaniteratordo.Thatsaid,evenifareductionisforcedthroughoperatorprioritiesorsomeothermethod,dowon’tshiftatall,becomingunusable.Thus,tosolvetheproblemwithoutanycontradictions,weneedtoeitherdealwithonthescannerlevelorwritearulethatallowstouseoperatorswithoutputtingthedo〜enditeratorintoexpr.
However,notputtingdo〜endintoexprisnotarealisticgoal.Thatwouldrequireallrulesforexpr(aswellasforargandprimary)toberepeated.Thisleavesusonlythescannersolution.
Rule-levelsolutionBelowisasimplifiedexampleofarelevantrule.
▼dosymbol
primary:kWHILEexpr_valuedocompstmtkEND
do:term|kDO_COND
primary:operationbrace_block|method_callbrace_block
brace_block:'{'opt_block_varcompstmt'}'|kDOopt_block_varcompstmtkEND
Asyoucansee,theterminalsymbolsforthedoofwhileandfortheiteratordoaredifferent.Fortheformerit’skDO_CONDwhileforthelatterit’skDOThenit’ssimplyamatterofpointingthatdistinctionouttothescanner.
Symbol-levelsolutionBelowisapartialviewoftheyylexsectionthatprocessesreservedwords.It’stheonlyparttaskedwithprocessingdosolookingatthiscodeshouldbeenoughtounderstandthecriteriaformakingthedistinction.
▼yylex-Identifier-Reservedword
4183if(kw->id[0]==kDO){4184if(COND_P())returnkDO_COND;4185if(CMDARG_P()&&state!=EXPR_CMDARG)4186returnkDO_BLOCK;4187if(state==EXPR_ENDARG)4188returnkDO_BLOCK;4189returnkDO;4190}
(parse.y)
It’salittlemessy,butyouonlyneedthepartassociatedwithkDO_COND.Thatisbecauseonlytwocomparisonsaremeaningful.ThefirstisthecomparisonbetweenkDO_CONDandkDO/kDO_BLOCKThesecondisthecomparisonbetweenkDOandkDO_BLOCK.Therestaremeaningless.Rightnowweonlyneedtodistinguishtheconditionaldo–leavealltheotherconditionsalone.
Basically,COND_P()isthekey.
COND_P()
cond_stack
COND_P()isdefinedclosetotheheadofparse.y
▼cond_stack
75#ifdefHAVE_LONG_LONG76typedefunsignedLONG_LONGstack_type;77#else78typedefunsignedlongstack_type;79#endif8081staticstack_typecond_stack=0;82#defineCOND_PUSH(n)(cond_stack=(cond_stack<<1)|((n)&1))83#defineCOND_POP()(cond_stack>>=1)84#defineCOND_LEXPOP()do{\85intlast=COND_P();\86cond_stack>>=1;\87if(last)cond_stack|=1;\88}while(0)89#defineCOND_P()(cond_stack&1)
(parse.y)
Thetypestack_typeiseitherlong(over32bit)orlonglong(over64bit).cond_stackisinitializedbyyycompile()atthestartofparsingandafterthatishandledonlythroughmacros.Allyouneed,then,istounderstandthosemacros.
IfyoulookatCOND_PUSH/POPyouwillseethatthesemacrosuseintegersasstacksconsistingofbits.
MSB←→LSB...0000000000Initialvalue0...0000000001COND_PUSH(1)...0000000010COND_PUSH(0)...0000000101COND_PUSH(1)...0000000010COND_POP()...0000000100COND_PUSH(0)...0000000010COND_POP()
AsforCOND_P(),sinceitdetermineswhetherornottheleastsignificantbit(LSB)isa1,iteffectivelydetermineswhethertheheadofthestackisa1.
TheremainingCOND_LEXPOP()isalittleweird.ItleavesCOND_P()attheheadofthestackandexecutesarightshift.Basically,it“crushes”thesecondbitfromthebottomwiththelowermostbit.
MSB←→LSB...0000000000Initialvalue0...0000000001COND_PUSH(1)...0000000010COND_PUSH(0)...0000000101COND_PUSH(1)...0000000011COND_LEXPOP()...0000000100COND_PUSH(0)...0000000010COND_LEXPOP()
((errata:ItleavesCOND_P()onlywhenitis1.WhenCOND_P()is0andthesecondbottombitis1,itwouldbecome1afterdoingLEXPOP,thusCOND_P()isnotleftinthiscase.))
NowIwillexplainwhatthatmeans.
InvestigatingthefunctionLetusinvestigatethefunctionofthisstack.TodothatIwilllistupallthepartswhereCOND_PUSH()COND_POP()areused.
|kWHILE{COND_PUSH(1);}expr_valuedo{COND_POP();}--|kUNTIL{COND_PUSH(1);}expr_valuedo{COND_POP();}--|kFORblock_varkIN{COND_PUSH(1);}expr_valuedo{COND_POP();}--case'(':::COND_PUSH(0);CMDARG_PUSH(0);--case'[':::COND_PUSH(0);CMDARG_PUSH(0);--case'{':::COND_PUSH(0);CMDARG_PUSH(0);--case']':
case'}':case')':COND_LEXPOP();CMDARG_LEXPOP();
Fromthiswecanderivethefollowinggeneralrules
AtthestartofaconditionalexpressionPUSH(1)AtopeningparenthesisPUSH(0)AttheendofaconditionalexpressionPOP()AtclosingparenthesisLEXPOP()
Withthis,youshouldseehowtouseit.Ifyouthinkaboutitforaminute,thenamecond_stackitselfisclearlythenameforamacrothatdetermineswhetherornotit’sonthesamelevelastheconditionalexpression(seeimage2)
Figure2:ChangesofCOND_P()
Usingthistrickshouldalsomakesituationsliketheoneshownbeloweasytodealwith.
while(mdo....end)#doisaniteratordo(kDO)
....end
Thismeansthatona32-bitmachineintheabsenceoflonglongifconditionalexpressionsorparenthesesarenestedat32levels,thingscouldgetstrange.Ofcourse,inrealityyouwon’tneedtonestsodeepsothere’snoactualrisk.
Finally,thedefinitionofCOND_LEXPOP()looksabitstrange–thatseemstobeawayofdealingwithlookahead.However,therulesnowdonotallowforlookaheadtooccur,sothere’snopurposetomakethedistinctionbetweenPOPandLEXPOP.Basically,atthistimeitwouldbecorrecttosaythatCOND_LEXPOP()hasnomeaning.
tLPAREN_ARG(1)
TheproblemThisoneisverycomplicated.ItonlybecameworkableininRuby1.7andonlyfairlyrecently.Thecoreoftheissueisinterpretingthis:
call(expr)+1
Asoneofthefollowing
(call(expr))+1
call((expr)+1)
Inthepast,itwasalwaysinterpretedastheformer.Thatis,theparentheseswerealwaystreatedas“Methodparameterparentheses”.ButsinceRuby1.7itbecamepossibletointerpretitasthelatter–basically,ifaspaceisadded,theparenthesesbecome“Parenthesesofexpr”
Iwillalsoprovideanexampletoexplainwhytheinterpretationchanged.First,Iwroteastatementasfollows
pm()+1
Sofarsogood.Butlet’sassumethevaluereturnedbymisafractionandtherearetoomanydigits.Thenwewillhaveitdisplayedasaninteger.
pm()+1.to_i#??
Uh-oh,weneedparentheses.
p(m()+1).to_i
Howtointerpretthis?Upto1.6itwillbethis
(p(m()+1)).to_i
Themuch-neededto_iisrenderedmeaningless,whichisunacceptable.Tocounterthat,addingaspacebetweenitandthe
parentheseswillcausetheparenthesestobetreatedspeciallyasexprparentheses.
Forthoseeagertotestthis,thisfeaturewasimplementedinparse.yrevision1.100(2001-05-31).Thus,itshouldberelativelyprominentwhenlookingatthedifferencesbetweenitand1.99.Thisisthecommandtofindthedifference.
~/src/ruby%cvsdiff-r1.99-r1.100parse.y
InvestigationFirstletuslookathowtheset-upworksinreality.Usingtheruby-lexertool{ruby-lexer:locatedintools/ruby-lexer.tar.gzontheCD}wecanlookatthelistofsymbolscorrespondingtotheprogram.
%ruby-lexer-e'm(a)'tIDENTIFIER'('tIDENTIFIER')''\n'
SimilarlytoRuby,-eistheoptiontopasstheprogramdirectlyfromthecommandline.Withthiswecantryallkindsofthings.Let’sstartwiththeproblemathand–thecasewherethefirstparameterisenclosedinparentheses.
%ruby-lexer-e'm(a)'tIDENTIFIERtLPAREN_ARGtIDENTIFIER')''\n'
Afteraddingaspace,thesymboloftheopeningparenthesisbecametLPAREN_ARG.Nowlet’slookatnormalexpression
parentheses.
%ruby-lexer-e'(a)'tLPARENtIDENTIFIER')''\n'
FornormalexpressionparenthesesitseemstobetLPAREN.Tosumup:
Input Symbolofopeningparenthesism(a) '('m(a) tLPAREN_ARG(a) tLPAREN
Thusthefocusisdistinguishingbetweenthethree.FornowtLPAREN_ARGisthemostimportant.
ThecaseofoneparameterWe’llstartbylookingattheyylex()sectionfor'('
▼yylex-'('
3841case'(':3842command_start=Qtrue;3843if(lex_state==EXPR_BEG||lex_state==EXPR_MID){3844c=tLPAREN;3845}3846elseif(space_seen){3847if(lex_state==EXPR_CMDARG){3848c=tLPAREN_ARG;3849}3850elseif(lex_state==EXPR_ARG){3851c=tLPAREN_ARG;3852yylval.id=last_id;
3853}3854}3855COND_PUSH(0);3856CMDARG_PUSH(0);3857lex_state=EXPR_BEG;3858returnc;
(parse.y)
SincethefirstifistLPARENwe’relookingatanormalexpressionparenthesis.Thedistinguishingfeatureisthatlex_stateiseitherBEGorMID–thatis,it’sclearlyatthebeginningoftheexpression.
Thefollowingspace_seenshowswhethertheparenthesisisprecededbyaspace.Ifthereisaspaceandlex_stateiseitherARGorCMDARG,basicallyifit’sbeforethefirstparameter,thesymbolisnot'('buttLPAREN_ARG.Thisway,forexample,thefollowingsituationcanbeavoided
m(#Parenthesisnotprecededbyaspace.Methodparenthesis('(')marg,(#Unlessfirstparameter,expressionparenthesis(tLPAREN)
WhenitisneithertLPARENnortLPAREN_ARG,theinputcharactercisusedasisandbecomes'('.Thiswilldefinitelybeamethodcallparenthesis.
Ifsuchacleardistinctionismadeonthesymbollevel,noconflictshouldoccurevenifrulesarewrittenasusual.Simplified,itbecomessomethinglikethis:
stmt:command_call
method_call:tIDENTIFIER'('args')'/*Normalmethod*/
command_call:tIDENTIFIERcommand_args/*Methodwithparenthesesomitted*/
command_args:args
args:arg:args','arg
arg:primary
primary:tLPARENcompstmt')'/*Normalexpressionparenthesis*/|tLPAREN_ARGexpr')'/*Firstparameterenclosedinparentheses*/|method_call
NowIneedyoutofocusonmethod_callandcommand_callIfyouleavethe'('withoutintroducingtLPAREN_ARG,thencommand_argswillproduceargs,argswillproducearg,argwillproduceprimary.Then,'('willappearfromtLPAREN_ARGandconflictwithmethod_call(seeimage3)
Figure3:method_callandcommand_call
ThecaseoftwoparametersandmoreOnemightthinkthatiftheparenthesisbecomestLPAREN_ARGallwillbewell.Thatisnotso.Forexample,considerthefollowing
m(a,a,a)
Beforenow,expressionslikethisoneweretreatedasmethodcallsanddidnotproduceerrors.However,iftLPAREN_ARGisintroduced,theopeningparenthesisbecomesanexprparenthesis,andiftwoormoreparametersarepresent,thatwillcauseaparseerror.Thisneedstoberesolvedforthesakeofcompatibility.
Unfortunately,rushingaheadandjustaddingarulelike
command_args:tLPAREN_ARGargs')'
willjustcauseaconflict.Let’slookatthebiggerpictureandthinkcarefully.
stmt:command_call|expr
expr:arg
command_call:tIDENTIFIERcommand_args
command_args:args|tLPAREN_ARGargs')'
args:arg:args','arg
arg:primary
primary:tLPARENcompstmt')'|tLPAREN_ARGexpr')'|method_call
method_call:tIDENTIFIER'('args')'
Lookatthefirstruleofcommand_argsHere,argsproducesargThenargproducesprimaryandoutoftherecomesthetLPAREN_ARGrule.Andsinceexprcontainsargandasitisexpanded,itbecomeslikethis:
command_args:tLPAREN_ARGarg')'|tLPAREN_ARGarg')'
Thisisareduce/reduceconflict,whichisverybad.
So,howcanwedealwithonly2+parameterswithoutcausingaconflict?We’llhavetowritetoaccommodateforthatsituationspecifically.Inpractice,it’ssolvedlikethis:
▼command_args
command_args:open_args
open_args:call_args|tLPAREN_ARG')'|tLPAREN_ARGcall_args2')'
call_args:command|argsopt_block_arg|args','tSTARarg_valueopt_block_arg|assocsopt_block_arg
|assocs','tSTARarg_valueopt_block_arg|args','assocsopt_block_arg|args','assocs','tSTARargopt_block_arg|tSTARarg_valueopt_block_arg|block_arg
call_args2:arg_value','argsopt_block_arg|arg_value','block_arg|arg_value','tSTARarg_valueopt_block_arg|arg_value','args','tSTARarg_valueopt_block_arg|assocsopt_block_arg|assocs','tSTARarg_valueopt_block_arg|arg_value','assocsopt_block_arg|arg_value','args','assocsopt_block_arg|arg_value','assocs','tSTARarg_valueopt_block_arg|arg_value','args','assocs','tSTARarg_valueopt_block_arg|tSTARarg_valueopt_block_arg|block_arg
primary:literal|strings|xstring:|tLPAREN_ARGexpr')'
Herecommand_argsisfollowedbyanotherlevel–open_argswhichmaynotbereflectedintheruleswithoutconsequence.Thekeyisthesecondandthirdrulesofthisopen_argsThisformissimilartotherecentexample,butisactuallysubtlydifferent.Thedifferenceisthatcall_args2hasbeenintroduced.Thedefiningcharacteristicofthiscall_args2isthatthenumberofparametersisalwaystwoormore.Thisisevidencedbythefactthatmostrulescontain','Theonlyexceptionisassocs,butsinceassocsdoesnotcomeoutofexpritcannotconflictanyway.
Thatwasn’taverygoodexplanation.Toputitsimply,inagrammarwherethis:
command_args:call_args
doesn’twork,andonlyinsuchagrammar,thenextruleisusedtomakeanaddition.Thus,thebestwaytothinkhereis“Inwhatkindofgrammarwouldthisrulenotwork?”Furthermore,sinceaconflictonlyoccurswhentheprimaryoftLPAREN_ARGappearsattheheadofcall_args,thescopecanbelimitedfurtherandthebestwaytothinkis“InwhatkindofgrammardoesthisrulenotworkwhenatIDENTIFIERtLPAREN_ARGlineappears?”Belowareafewexamples.
m(a,a)
ThisisasituationwhenthetLPAREN_ARGlistcontainstwoormoreitems.
m()
Conversely,thisisasituationwhenthetLPAREN_ARGlistisempty.
m(*args)m(&block)m(k=>v)
ThisisasituationwhenthetLPAREN_ARGlistcontainsaspecialexpression(onenotpresentinexpr).
Thisshouldbesufficientformostcases.Nowlet’scomparetheabovewithapracticalimplementation.
▼open_args(1)
open_args:call_args|tLPAREN_ARG')'
First,theruledealswithemptylists
▼open_args(2)
|tLPAREN_ARGcall_args2')'
call_args2:arg_value','argsopt_block_arg|arg_value','block_arg|arg_value','tSTARarg_valueopt_block_arg|arg_value','args','tSTARarg_valueopt_block_arg|assocsopt_block_arg|assocs','tSTARarg_valueopt_block_arg|arg_value','assocsopt_block_arg|arg_value','args','assocsopt_block_arg|arg_value','assocs','tSTARarg_valueopt_block_arg|arg_value','args','assocs','tSTARarg_valueopt_block_arg|tSTARarg_valueopt_block_arg|block_arg
Andcall_args2dealswithelementscontainingspecialtypessuchasassocs,passingofarraysorpassingofblocks.Withthis,thescopeisnowsufficientlybroad.
tLPAREN_ARG(2)
TheproblemIntheprevioussectionIsaidthattheexamplesprovidedshouldbesufficientfor“most”specialmethodcallexpressions.Isaid“most”becauseiteratorsarestillnotcovered.Forexample,thebelowstatementwillnotwork:
m(a){....}m(a)do....end
Inthissectionwewillonceagainlookatthepreviouslyintroducedpartswithsolvingthisprobleminmind.
Rule-levelsolutionLetusstartwiththerules.Thefirstparthereisallfamiliarrules,sofocusonthedo_blockpart
▼command_call
command_call:command|block_command
command:operationcommand_args
command_args:open_args
open_args:call_args|tLPAREN_ARG')'|tLPAREN_ARGcall_args2')'
block_command:block_call
block_call:commanddo_block
do_block:kDO_BLOCKopt_block_varcompstmt'}'|tLBRACE_ARGopt_block_varcompstmt'}'
Bothdoand{arecompletelynewsymbolskDO_BLOCKandtLBRACE_ARG.Whyisn’titkDOor'{'youask?Inthiskindofsituationthebestanswerisanexperiment,sowewilltryreplacingkDO_BLOCKwithkDOandtLBRACE_ARGwith'{'andprocessingthatwithyacc
%yaccparse.yconflicts:2shift/reduce,6reduce/reduce
Itconflictsbadly.Afurtherinvestigationrevealsthatthisstatementisthecause.
m(a),b{....}
Thatisbecausethiskindofstatementisalreadysupposedtowork.b{....}becomesprimary.AndnowarulehasbeenaddedthatconcatenatestheblockwithmThatresultsintwopossibleinterpretations:
m((a),b){....}m((a),(b{....}))
Thisisthecauseoftheconflict–namely,a2shift/reduceconflict.
Theotherconflicthastodowithdo〜end
m((a))do....end#Adddo〜endusingblock_callm((a))do....end#Adddo〜endusingprimary
Thesetwoconflict.Thisis6reduce/reduceconflict.
{〜}iteratorThisistheimportantpart.Asshownpreviously,youcanavoidaconflictbychangingthedoand'{'symbols.
▼yylex-'{'
3884case'{':3885if(IS_ARG()||lex_state==EXPR_END)3886c='{';/*block(primary)*/3887elseif(lex_state==EXPR_ENDARG)3888c=tLBRACE_ARG;/*block(expr)*/3889else3890c=tLBRACE;/*hash*/3891COND_PUSH(0);3892CMDARG_PUSH(0);3893lex_state=EXPR_BEG;3894returnc;
(parse.y)
IS_ARG()isdefinedas
▼IS_ARG
3104#defineIS_ARG()(lex_state==EXPR_ARG||lex_state==EXPR_CMDARG)
(parse.y)
Thus,whenthestateisEXPR_ENDARGitwillalwaysbefalse.Inotherwords,whenlex_stateisEXPR_ENDARG,itwillalwaysbecometLBRACE_ARG,sothekeytoeverythingisthetransitiontoEXPR_ENDARG.
EXPR_ENDARG
NowweneedtoknowhowtosetEXPR_ENDARGIusedgreptofindwhereitisassigned.
▼TransitiontoEXPR_ENDARG
open_args:call_args|tLPAREN_ARG{lex_state=EXPR_ENDARG;}')'|tLPAREN_ARGcall_args2{lex_state=EXPR_ENDARG;}')'
primary:tLPAREN_ARGexpr{lex_state=EXPR_ENDARG;}')'
That’sstrange.OnewouldexpectthetransitiontoEXPR_ENDARGtooccuraftertheclosingparenthesiscorrespondingtotLPAREN_ARG,butit’sactuallyassignedbefore')'IrangrepafewmoretimesthinkingtheremightbeotherpartssettingtheEXPR_ENDARGbutfoundnothing.
Maybethere’ssomemistake.Maybelex_stateisbeingchangedsomeotherway.Let’suserubylex-analysertovisualizethelex_statetransition.
%rubylex-analyser-e'm(a){nil}'+EXPR_BEGEXPR_BEGC"m"tIDENTIFIEREXPR_CMDARGEXPR_CMDARGS"("tLPAREN_ARGEXPR_BEG
0:condpush0:cmdpush1:cmdpush-EXPR_BEGC"a"tIDENTIFIEREXPR_CMDARGEXPR_CMDARG")"')'EXPR_END0:condlexpop1:cmdlexpop+EXPR_ENDARGEXPR_ENDARGS"{"tLBRACE_ARGEXPR_BEG0:condpush10:cmdpush0:cmdresumeEXPR_BEGS"nil"kNILEXPR_ENDEXPR_ENDS"}"'}'EXPR_END0:condlexpop0:cmdlexpopEXPR_END"\n"\nEXPR_BEG
Thethreebigbranchinglinesshowthestatetransitioncausedbyyylex().Ontheleftisthestatebeforeyylex()Themiddletwoarethewordtextanditssymbols.Finally,ontherightisthelex_stateafteryylex()
Theproblemherearepartsofsinglelinesthatcomeoutas+EXPR_ENDARG.Thisindicatesatransitionoccurringduringparseraction.Accordingtothis,forsomereasonanactionisexecutedafterreadingthe')'atransitiontoEXPR_ENDARGoccursand'{'isnicelychangedintotLBRACE_ARGThisisactuallyaprettyhigh-leveltechnique–generously(ab)usingtheLALRuptothe(1).
Abusingthelookaheadruby-ycanbringupadetaileddisplayoftheyaccparserengine.Thistimewewilluseittomorecloselytracetheparser.
%ruby-yce'm(a){nil}'2>&1|egrep'^Reading|Reducing'Reducingviarule1(line303),->@1Readingatoken:Nexttokenis304(tIDENTIFIER)Readingatoken:Nexttokenis340(tLPAREN_ARG)Reducingviarule446(line2234),tIDENTIFIER->operationReducingviarule233(line1222),->@6Readingatoken:Nexttokenis304(tIDENTIFIER)Readingatoken:Nexttokenis41(')')Reducingviarule392(line1993),tIDENTIFIER->variableReducingviarule403(line2006),variable->var_refReducingviarule256(line1305),var_ref->primaryReducingviarule198(line1062),primary->argReducingviarule42(line593),arg->exprReducingviarule260(line1317),->@9Reducingviarule261(line1317),tLPAREN_ARGexpr@9')'->primaryReadingatoken:Nexttokenis344(tLBRACE_ARG)::
Herewe’reusingtheoption-cwhichstopstheprocessatjustcompilingand-ewhichallowstogiveaprogramfromthecommandline.Andwe’reusinggreptosingleouttokenreadandreductionreports.
Startbylookingatthemiddleofthelist.')'isread.Nowlookattheend–thereduction(execution)ofembeddingaction(@9)finallyhappens.Indeed,thiswouldallowEXPR_ENDARGtobesetafterthe')'beforethe'{'Butisthisalwaysthecase?Let’stakeanotherlookatthepartwhereit’sset.
Rule1tLPAREN_ARG{lex_state=EXPR_ENDARG;}')'Rule2tLPAREN_ARGcall_args2{lex_state=EXPR_ENDARG;}')'Rule3tLPAREN_ARGexpr{lex_state=EXPR_ENDARG;}')'
Theembeddingactioncanbesubstitutedwithanemptyrule.Forexample,wecanrewritethisusingrule1withnochangeinmeaningwhatsoever.
target:tLPAREN_ARGtmp')'tmp:{lex_state=EXPR_ENDARG;}
Assumingthatthisisbeforetmp,it’spossiblethatoneterminalsymbolwillbereadbylookahead.Thuswecanskipthe(empty)tmpandreadthenext.Andifwearecertainthatlookaheadwilloccur,theassignmenttolex_stateisguaranteedtochangetoEXPR_ENDARGafter')'Butis')'certaintobereadbylookaheadinthisrule?
AscertaininglookaheadThisisactuallyprettyclear.Thinkaboutthefollowinginput.
m(){nil}#Am(a){nil}#Bm(a,b,c){nil}#C
Ialsotooktheopportunitytorewritetheruletomakeiteasiertounderstand(withnoactualchanges).
rule1:tLPAREN_ARGe1')'rule2:tLPAREN_ARGone_arge2')'rule3:tLPAREN_ARGmore_argse3')'
e1:/*empty*/
e2:/*empty*/e3:/*empty*/
First,thecaseofinputA.Readingupto
m(#...tLPAREN_ARG
wearrivebeforethee1.Ife1isreducedhere,anotherrulecannotbechosenanymore.Thus,alookaheadoccurstoconfirmwhethertoreducee1andcontinuewithrule1tothebitterendortochooseadifferentrule.Accordingly,iftheinputmatchesrule1itiscertainthat')'willbereadbylookahead.
OntoinputB.First,readinguptohere
m(#...tLPAREN_ARG
Herealookaheadoccursforthesamereasonasdescribedabove.Furtherreadinguptohere
m(a#...tLPAREN_ARG'('tIDENTIFIER
Anotherlookaheadoccurs.Itoccursbecausedependingonwhetherwhatfollowsisa','ora')'adecisionismadebetweenrule2andrule3Ifwhatfollowsisa','thenitcanonlybeacommatoseparateparameters,thusrule3therulefortwoormoreparameters,ischosen.Thisisalsotrueiftheinputisnotasimpleabutsomethinglikeaniforliteral.Whentheinputiscomplete,alookaheadoccurstochoosebetweenrule2andrule3-therulesfor
oneparameterandtwoormoreparametersrespectively.
Thepresenceofaseparateembeddingactionispresentbefore')'ineveryrule.There’snogoingbackafteranactionisexecuted,sotheparserwilltrytopostponeexecutinganactionuntilitisascertainaspossible.Forthatreason,situationswhenthiscertaintycannotbegainedwithasinglelookaheadshouldbeexcludedwhenbuildingaparserasitisaconflict.
ProceedingtoinputC.
m(a,b,c
Atthispointanythingotherthanrule3isunlikelysowe’renotexpectingalookahead.Andyet,thatiswrong.Ifthefollowingis'('thenit’samethodcall,butifthefollowingis','or')'itneedstobeavariablereference.Basically,thistimealookaheadisneededtoconfirmparameterelementsinsteadofembeddingactionreduction.
Butwhatabouttheotherinputs?Forexample,whatifthethirdparameterisamethodcall?
m(a,b,c(....)#...','method_call
Onceagainalookaheadisnecessarybecauseachoiceneedstobemadebetweenshiftandreductiondependingonwhetherwhatfollowsis','or')'.Thus,inthisruleinallinstancesthe')'isreadbeforetheembeddingactionisexecuted.Thisisquite
complicatedandmorethanalittleimpressive.
Butwoulditbepossibletosetlex_stateusinganormalactioninsteadofanembeddingaction?Forexample,likethis:
|tLPAREN_ARG')'{lex_state=EXPR_ENDARG;}
Thiswon’tdobecauseanotherlookaheadislikelytooccurbeforetheactionisreduced.Thistimethelookaheadworkstoourdisadvantage.WiththisitshouldbeclearthatabusingthelookaheadofaLALRparserisprettytrickyandnotsomethinganoviceshouldbedoing.
do〜enditeratorSofarwe’vedealtwiththe{〜}iterator,butwestillhavedo〜end
left.Sincethey’rebothiterators,onewouldexpectthesamesolutionstowork,butitisn’tso.Theprioritiesaredifferent.Forexample,
ma,b{....}#m(a,(b{....}))ma,bdo....end#m(a,b)do....end
Thusit’sonlyappropriatetodealwiththemdifferently.
Thatsaid,insomesituationsthesamesolutionsdoapply.Theexamplebelowisonesuchsituation
m(a){....}m(a)do....end
Intheend,ouronlyoptionistolookattherealthing.Sincewe’redealingwithdohere,weshouldlookinthepartofyylex()thathandlesreservedwords.
▼yylex-Identifiers-Reservedwords-do
4183if(kw->id[0]==kDO){4184if(COND_P())returnkDO_COND;4185if(CMDARG_P()&&state!=EXPR_CMDARG)4186returnkDO_BLOCK;4187if(state==EXPR_ENDARG)4188returnkDO_BLOCK;4189returnkDO;4190}
(parse.y)
ThistimeweonlyneedthepartthatdistinguishesbetweenkDO_BLOCKandkDO.IgnorekDO_CONDOnlylookatwhat’salwaysrelevantinafinite-statescanner.
Thedecision-makingpartusingEXPR_ENDARGisthesameastLBRACE_ARGsoprioritiesshouldn’tbeanissuehere.Similarlyto'{'therightcourseofactionisprobablytomakeitkDO_BLOCK
((errata:Inthefollowingcase,prioritiesshouldhaveaninfluence.(Butitdoesnotintheactualcode.Itmeansthisisabug.)
mm(a){...}#Thisshouldbeinterpretedasm(m(a){...}),#butisinterpretedasm(m(a)){...}mm(a)do...end#asthesameasthis:m(m(a))do...end
))
TheproblemlieswithCMDARG_P()andEXPR_CMDARG.Let’slookatboth.
CMDARG_P()
▼cmdarg_stack
91staticstack_typecmdarg_stack=0;92#defineCMDARG_PUSH(n)(cmdarg_stack=(cmdarg_stack<<1)|((n)&1))93#defineCMDARG_POP()(cmdarg_stack>>=1)94#defineCMDARG_LEXPOP()do{\95intlast=CMDARG_P();\96cmdarg_stack>>=1;\97if(last)cmdarg_stack|=1;\98}while(0)99#defineCMDARG_P()(cmdarg_stack&1)
(parse.y)
Thestructureandinterface(macro)ofcmdarg_stackiscompletelyidenticaltocond_stack.It’sastackofbits.Sinceit’sthesame,wecanusethesamemeanstoinvestigateit.Let’slistuptheplaceswhichuseit.First,duringtheactionwehavethis:
command_args:{$<num>$=cmdarg_stack;CMDARG_PUSH(1);}open_args{/*CMDARG_POP()*/cmdarg_stack=$<num>1;$$=$2;}
$<num>$representstheleftvaluewithaforcedcasting.Inthiscaseitcomesoutasthevalueoftheembeddingactionitself,soitcanbeproducedinthenextactionwith$<num>1.Basically,it’sastructurewherecmdarg_stackishiddenin$$beforeopen_argsandthenrestoredinthenextaction.
Butwhyuseahide-restoresysteminsteadofasimplepush-pop?Thatwillbeexplainedattheendofthissection.
Searchingyylex()formoreCMDARGrelations,Ifoundthis.
Token Relation'(''[''{' CMDARG_PUSH(0)')'']''}' CMDARG_LEXPOP()
Basically,aslongasitisenclosedinparentheses,CMDARG_P()isfalse.
Considerboth,anditcanbesaidthatwhencommand_args,aparameterforamethodcallwithparenthesesomitted,isnotenclosedinparenthesesCMDARG_P()istrue.
EXPR_CMDARG
Nowlet’stakealookatonemorecondition–EXPR_CMDARGLikebefore,letuslookforplacewhereatransitiontoEXPR_CMDARGoccurs.
▼yylex-Identifiers-StateTransitions
4201if(lex_state==EXPR_BEG||4202lex_state==EXPR_MID||4203lex_state==EXPR_DOT||4204lex_state==EXPR_ARG||4205lex_state==EXPR_CMDARG){4206if(cmd_state)4207lex_state=EXPR_CMDARG;4208else4209lex_state=EXPR_ARG;4210}4211else{4212lex_state=EXPR_END;4213}
(parse.y)
Thisiscodethathandlesidentifiersinsideyylex()Leavingasidethatthereareabunchoflex_statetestsinhere,let’slookfirstatcmd_stateAndwhatisthis?
▼cmd_state
3106staticint3107yylex()3108{3109staticIDlast_id=0;3110registerintc;3111intspace_seen=0;3112intcmd_state;31133114if(lex_strterm){/*……omitted……*/3132}3133cmd_state=command_start;3134command_start=Qfalse;
(parse.y)
Turnsoutit’sanyylexlocalvariable.Furthermore,aninvestigationusinggreprevealedthathereistheonlyplacewhereitsvalueisaltered.Thismeansit’sjustatemporaryvariableforstoringcommand_startduringasinglerunofyylex
Whendoescommand_startbecometrue,then?
▼command_start
2327staticintcommand_start=Qtrue;
2334staticNODE*2335yycompile(f,line)2336char*f;2337intline;2338{:2380command_start=1;
staticintyylex(){:case'\n':/*……omitted……*/3165command_start=Qtrue;3166lex_state=EXPR_BEG;3167return'\n';
3821case';':3822command_start=Qtrue;
3841case'(':3842command_start=Qtrue;
(parse.y)
Fromthisweunderstandthatcommand_startbecomestruewhenoneoftheparse.ystaticvariables\n;(isscanned.
Summingupwhatwe’vecovereduptonow,first,when\n;(isread,command_startbecomestrueandduringthenextyylex()runcmd_statebecomestrue.
Andhereisthecodeinyylex()thatusescmd_state
▼yylex-Identifiers-Statetransitions
4201if(lex_state==EXPR_BEG||4202lex_state==EXPR_MID||4203lex_state==EXPR_DOT||4204lex_state==EXPR_ARG||4205lex_state==EXPR_CMDARG){4206if(cmd_state)4207lex_state=EXPR_CMDARG;4208else4209lex_state=EXPR_ARG;4210}4211else{4212lex_state=EXPR_END;4213}
(parse.y)
Fromthisweunderstandthefollowing:whenafter\n;(thestateisEXPR_BEGMIDDOTARGCMDARGandanidentifierisread,atransitiontoEXPR_CMDARGoccurs.However,lex_statecanonlybecomeEXPR_BEGfollowinga\n;(sowhenatransitionoccurstoEXPR_CMDARGthelex_statelosesitsmeaning.Thelex_staterestrictionisonlyimportanttotransitionsdealingwithEXPR_ARG
BasedontheabovewecannowthinkofasituationwherethestateisEXPR_CMDARG.Forexample,seetheonebelow.Theunderscoreisthecurrentposition.
m_m(m_mm_
((errata:Thethirdone“mm_”isnotEXPR_CMDARG.(ItisEXPR_ARG.)))
ConclusionLetusnowreturntothedodecisioncode.
▼yylex-Identifiers-Reservedwords-kDO-kDO_BLOCK
4185if(CMDARG_P()&&state!=EXPR_CMDARG)4186returnkDO_BLOCK;
(parse.y)
Insidetheparameterofamethodcallwithparenthesesomittedbutnotbeforethefirstparameter.Thatmeansfromthesecondparameterofcommand_callonward.Basically,likethis:
marg,argdo....endm(arg),argdo....end
WhyisthecaseofEXPR_CMDARGexcluded?ThisexampleshouldclearItup
mdo....end
Thispatterncanalreadybehandledusingthedo〜enditeratorwhichuseskDOandisdefinedinprimaryThus,includingthatcasewouldcauseanotherconflict.
RealityandtruthDidyouthinkwe’redone?Notyet.Certainly,thetheoryisnowcomplete,butonlyifeverythingthathasbeenwritteniscorrect.Asamatteroffact,thereisonefalsehoodinthissection.Well,moreaccurately,itisn’tafalsehoodbutaninexactstatement.It’sinthepartaboutCMDARG_P()
Actually,CMDARG_P()becomestruewheninsidecommand_args,thatistosay,insidetheparameterofamethodcallwithparenthesesomitted.
Butwhereexactlyis“insidetheparameterofamethodcallwithparenthesesomitted”?Onceagain,letususerubylex-analysertoinspectindetail.
%rubylex-analyser-e'ma,a,a,a;'+EXPR_BEGEXPR_BEGC"m"tIDENTIFIEREXPR_CMDARGEXPR_CMDARGS"a"tIDENTIFIEREXPR_ARG1:cmdpush-EXPR_ARG","','EXPR_BEGEXPR_BEG"a"tIDENTIFIEREXPR_ARGEXPR_ARG","','EXPR_BEGEXPR_BEG"a"tIDENTIFIEREXPR_ARG
EXPR_ARG","','EXPR_BEGEXPR_BEG"a"tIDENTIFIEREXPR_ARGEXPR_ARG";"';'EXPR_BEG0:cmdresumeEXPR_BEGC"\n"'EXPR_BEG
The1:cmdpush-intherightcolumnisthepushtocmd_stack.Whentherightmostdigitinthatlineis1CMDARG_P()becometrue.Tosumup,theperiodofCMDARG_P()canbedescribedas:
FromimmediatelyafterthefirstparameterofamethodcallwithparenthesesomittedTotheterminalsymbolfollowingthefinalparameter
But,verystrictlyspeaking,eventhisisstillnotentirelyaccurate.
%rubylex-analyser-e'ma(),a,a;'+EXPR_BEGEXPR_BEGC"m"tIDENTIFIEREXPR_CMDARGEXPR_CMDARGS"a"tIDENTIFIEREXPR_ARG1:cmdpush-EXPR_ARG"("'('EXPR_BEG0:condpush10:cmdpushEXPR_BEGC")"')'EXPR_END0:condlexpop1:cmdlexpopEXPR_END","','EXPR_BEGEXPR_BEG"a"tIDENTIFIEREXPR_ARGEXPR_ARG","','EXPR_BEGEXPR_BEG"a"tIDENTIFIEREXPR_ARGEXPR_ARG";"';'EXPR_BEG0:cmdresumeEXPR_BEGC"\n"'EXPR_BEG
Whenthefirstterminalsymbolofthefirstparameterhasbeen
read,CMDARG_P()istrue.Therefore,thecompleteanswerwouldbe:
FromthefirstterminalsymbolofthefirstparameterofamethodcallwithparenthesesomittedTotheterminalsymbolfollowingthefinalparameter
Whatrepercussionsdoesthisfacthave?RecallthecodethatusesCMDARG_P()
▼yylex-Identifiers-Reservedwords-kDO-kDO_BLOCK
4185if(CMDARG_P()&&state!=EXPR_CMDARG)4186returnkDO_BLOCK;
(parse.y)
EXPR_CMDARGstandsfor“Beforethefirstparameterofcommand_call”andisexcluded.Butwait,thismeaningisalsoincludedinCMDARG_P().Thus,thefinalconclusionofthissection:
EXPR_CMDARGiscompletelyuseless
Truthbetold,whenIrealizedthis,Ialmostbrokedowncrying.IwassureithadtomeanSOMETHINGandspentenormouseffortanalyzingthesource,butcouldn’tunderstandanything.Finally,Iranallkindoftestsonthecodeusingrubylex-analyserandarrivedattheconclusionthatithasnomeaningwhatsoever.
Ididn’tspendsomuchtimedoingsomethingmeaninglessjusttofillupmorepages.Itwasanattempttosimulateasituationlikely
tohappeninreality.Noprogramisperfect,allprogramscontaintheirownmistakes.Complicatedsituationsliketheonediscussedherearewheremistakesoccurmosteasily,andwhentheydo,readingthesourcematerialwiththeassumptionthatit’sflawlesscanreallybackfire.Intheend,whenreadingthesourcecode,youcanonlytrustthewhatactuallyhappens.
Hopefully,thiswillteachyoutheimportanceofdynamicanalysis.Wheninvestigatingsomething,focusonwhatreallyhappens.Thesourcecodewillnottellyoueverything.Itcan’ttellanythingotherthanwhatthereaderinfers.
Andwiththisveryusefulsermon,Iclosethechapter.
((errata:Thisconfidentlywrittenconclusionwaswrong.WithoutEXPR_CMDARG,forinstance,thisprogram“m(mdoend)”cannotbeparsed.Thisisanexampleofthefactthatcorrectnessisnotprovedevenifdynamicanalysesaredonesomanytimes.))
StillnottheendAnotherthingIforgot.Ican’tendthechapterwithoutexplainingwhyCMDARG_P()takesthatvalue.Here’stheproblematicpart:
▼command_args
1209command_args:{1210$<num>$=cmdarg_stack;1211CMDARG_PUSH(1);
1212}1213open_args1214{1215/*CMDARG_POP()*/1216cmdarg_stack=$<num>1;1217$$=$2;1218}
1221open_args:call_args
(parse.y)
Allthingsconsidered,thislookslikeanotherinfluencefromlookahead.command_argsisalwaysinthefollowingcontext:
tIDENTIFIER_
Thus,thislookslikeavariablereferenceoramethodcall.Ifit’savariablereference,itneedstobereducedtovariableandifit’samethodcallitneedstobereducedtooperationWecannotdecidehowtoproceedwithoutemployinglookahead.Thusalookaheadalwaysoccursattheheadofcommand_argsandafterthefirstterminalsymbolofthefirstparameterisread,CMDARG_PUSH()isexecuted.
ThereasonwhyPOPandLEXPOPexistseparatelyincmdarg_stackisalsohere.Observethefollowingexample:
%rubylex-analyser-e'mm(a),a'-e:1:warning:parenthesizeargument(s)forfutureversion+EXPR_BEGEXPR_BEGC"m"tIDENTIFIEREXPR_CMDARGEXPR_CMDARGS"m"tIDENTIFIEREXPR_ARG1:cmdpush-
EXPR_ARGS"("tLPAREN_ARGEXPR_BEG0:condpush10:cmdpush101:cmdpush-EXPR_BEGC"a"tIDENTIFIEREXPR_CMDARGEXPR_CMDARG")"')'EXPR_END0:condlexpop11:cmdlexpop+EXPR_ENDARGEXPR_ENDARG","','EXPR_BEGEXPR_BEGS"a"tIDENTIFIEREXPR_ARGEXPR_ARG"\n"\nEXPR_BEG10:cmdresume0:cmdresume
Lookingonlyatthepartsrelatedtocmdandhowtheycorrespondtoeachother…
1:cmdpush-parserpush(1)10:cmdpushscannerpush101:cmdpush-parserpush(2)11:cmdlexpopscannerpop10:cmdresumeparserpop(2)0:cmdresumeparserpop(1)
Thecmdpush-withaminussignattheendisaparserpush.Basically,pushandpopdonotcorrespond.Originallythereweresupposedtobetwoconsecutivepush-andthestackwouldbecome110,butduetothelookaheadthestackbecame101instead.CMDARG_LEXPOP()isalast-resortmeasuretodealwiththis.Thescanneralwayspushes0sonormallywhatitpopsshouldalsoalwaysbe0.Whenitisn’t0,wecanonlyassumethatit’s1duetotheparserpushbeinglate.Thus,thevalueisleft.
Conversely,atthetimeoftheparserpopthestackissupposedtobe
backinnormalstateandusuallypopshouldn’tcauseanytrouble.Whenitdoesn’tdothat,thereasonisbasicallythatitshouldworkright.Whetherpoppingorhidingin$$andrestoring,theprocessisthesame.Whenyouconsiderallthefollowingalterations,it’sreallyimpossibletotellhowlookahead’sbehaviorwillchange.Moreover,thisproblemappearsinagrammarthat’sgoingtobeforbiddeninthefuture(that’swhythereisawarning).Tomakesomethinglikethiswork,thetrickistoconsidernumerouspossiblesituationsandrespondthem.AndthatiswhyIthinkthiskindofimplementationisrightforRuby.Thereinliestherealsolution.
TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera
CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License
RubyHackingGuide
Chapter12:Syntaxtree
construction
Node
NODE
AsI’vealreadydescribed,aRubyprogramisfirstconvertedtoasyntaxtree.Tobemoreprecise,asyntaxtreeisatreestructuremadeofstructscalled“nodes”.Inruby,allnodesareoftypeNODE.
▼NODE
128typedefstructRNode{129unsignedlongflags;130char*nd_file;131union{132structRNode*node;133IDid;134VALUEvalue;135VALUE(*cfunc)(ANYARGS);136ID*tbl;137}u1;138union{139structRNode*node;140IDid;141intargc;
142VALUEvalue;143}u2;144union{145structRNode*node;146IDid;147longstate;148structglobal_entry*entry;149longcnt;150VALUEvalue;151}u3;152}NODE;
(node.h)
AlthoughyoumightbeabletoinferfromthestructnameRNode,nodesareRubyobjects.Thismeansthecreationandreleaseofnodesaretakencareofbytheruby’sgarbagecollector.
Therefore,flagsnaturallyhasthesameroleasbasic.flagsoftheobjectstruct.ItmeansthatT_NODEwhichisthetypeofastructandflagssuchasFL_FREEZEarestoredinit.AsforNODE,inadditiontothese,itsnodetypeisstoredinflags.
Whatdoesitmean?Sinceaprogramcouldcontainvariouselementssuchasifandwhileanddefandsoon,therearealsovariouscorrespondingnodetypes.Thethreeavailableunionarecomplicated,buthowtheseunionsareusedisdecidedtoonlyonespecificwayforeachnode.Forexample,thebelowtableshowsthecasewhenitisNODE_IFthatisthenodeofif.
member unionmember roleu1 u1.node theconditionexpressionu2 u2.node thebodyoftrue
u3 u3.node thebodyoffalse
And,innode.h,themacrostoaccesseachunionmemberareavailable.
▼themacrostoaccessNODE
166#definend_headu1.node167#definend_alenu2.argc168#definend_nextu3.node169170#definend_condu1.node171#definend_bodyu2.node172#definend_elseu3.node173174#definend_origu3.value::
(node.h)
Forexample,theseareusedasfollows:
NODE*head,*tail;head->nd_next=tail;/*head->u3.node=tail*/
Inthesourcecode,it’salmostcertainthatthesemacrosareused.AveryfewexceptionsareonlythetwoplaceswherecreatingNODEinparse.yandwheremarkingNODEingc.c.
Bytheway,whatisthereasonwhysuchmacrosareused?Foronething,itmightbebecauseit’scumbersometoremembernumberslikeu1thatarenotmeaningfulbyjustthemselves.Butwhatis
moreimportantthanthatis,thereshouldbenoproblemifthecorrespondingnumberischangedandit’spossiblethatitwillactuallybechanged.Forexample,sinceaconditionclauseofifdoesnothavetobestoredinu1,someonemightwanttochangeittou2forsomereason.Butifu1isdirectlyused,heneedstomodifyalotofplacesalloverthesourcecodes,itisinconvenient.SincenodesarealldeclaredasNODE,it’shardtofindnodesthatrepresentif.Bypreparingthemacrostoaccess,thiskindoftroublecanbeavoidedandconverselywecandeterminethenodetypesfromthemacros.
NodeTypeIsaidthatintheflagsofaNODEstructitsnodetypeisstored.We’lllookatinwhatformthisinformationisstored.Anodetypecanbesetbynd_set_type()andobtainedbynd_type().
▼nd_typend_set_type
156#definend_type(n)(((RNODE(n))->flags>>FL_USHIFT)&0xff)157#definend_set_type(n,t)\158RNODE(n)->flags=((RNODE(n)->flags&~FL_UMASK)\|(((t)<<FL_USHIFT)&FL_UMASK))
(node.h)
▼FL_USHIFTFL_UMASK
418#defineFL_USHIFT11429#defineFL_UMASK(0xff<<FL_USHIFT)
(ruby.h)
Itwon’tbesomuchtroubleifwe’llkeepfocusonaroundnd_type.Fig.1showshowitseemslike.
Fig.1:TheusageofRNode.flags
And,sincemacroscannotbeusedfromdebuggers,thenodetype()functionisalsoavailable.
▼nodetype
4247staticenumnode_type4248nodetype(node)/*fordebug*/4249NODE*node;4250{4251return(enumnode_type)nd_type(node);4252}
(parse.y)
FileNameandLineNumberThend_fileofaNODEholds(thepointerto)thenameofthefilewherethetextthatcorrespondstothisnodeexists.Sincethere’s
thefilename,wenaturallyexpectthatthere’salsothelinenumber,butthecorrespondingmembercouldnotbefoundaroundhere.Actually,thelinenumberisbeingembeddedtoflagsbythefollowingmacro:
▼nd_linend_set_line
160#defineNODE_LSHIFT(FL_USHIFT+8)161#defineNODE_LMASK(((long)1<<(sizeof(NODE*)*CHAR_BIT-NODE_LSHIFT))-1)162#definend_line(n)\((unsignedint)((RNODE(n)->flags>>NODE_LSHIFT)&NODE_LMASK))163#definend_set_line(n,l)\164RNODE(n)->flags=((RNODE(n)->flags&~(-1<<NODE_LSHIFT))\|(((l)&NODE_LMASK)<<NODE_LSHIFT))
(node.h)
nd_set_line()isfairlyspectacular.However,asthenamessuggest,itiscertainthatnd_set_line()andnd_lineworkssymmetrically.Thus,ifwefirstexaminethesimplernd_line()andgrasptherelationshipbetweentheparameters,there’snoneedtoanalyzend_set_line()inthefirstplace.
ThefirstthingisNODE_LSHIFT,asyoucanguessfromthedescriptionofthenodetypesoftheprevioussection,itisthenumberofusedbitsinflags.FL_USHIFTisreservedbysystemofruby(11bits,ruby.h),8bitsareforitsnodetype.
ThenextthingisNODE_LMASK.
sizeof(NODE*)*CHAR_BIT-NODE_LSHIFT
Thisisthenumberoftherestofthebits.Let’sassumeitisrestbits.Thismakesthecodealotsimpler.
#defineNODE_LMASK(((long)1<<restbits)-1)
Fig.2showswhattheabovecodeseemstobedoing.Notethataborrowoccurswhensubtracting1.WecaneventuallyunderstandthatNODE_LMASKisasequencefilledwith1whosesizeisthenumberofthebitsthatarestillavailable.
Fig.2:NODE_LMASK
Now,let’slookatnd_line()again.
(RNODE(n)->flags>>NODE_LSHIFT)&NODE_LMASK
Bytherightshift,theunusedspaceisshiftedtotheLSB.ThebitwiseANDleavesonlytheunusedspace.Fig.3showshowflagsisused.SinceFL_USHIFTis11,in32-bitmachine32-(11+8)=13bitsareavailableforthelinenumber.
Fig.3:HowflagsareusedatNODE
…Thismeans,ifthelinenumbersbecomesbeyond2^13=8192,thelinenumbersshouldwronglybedisplayed.Let’stry.
File.open('overflow.rb','w'){|f|10000.times{f.puts}f.puts'raise'}
Withmy686machine,rubyoverflow.rbproperlydisplayed1809asalinenumber.I’vesucceeded.However,ifyouuse64-bitmachine,youneedtocreatealittlebiggerfileinordertosuccessfullyfail.
rb_node_newnode()
Lastlylet’slookatthefunctionrb_node_newnode()thatcreatesanode.
▼rb_node_newnode()
4228NODE*4229rb_node_newnode(type,a0,a1,a2)4230enumnode_typetype;4231NODE*a0,*a1,*a2;
4232{4233NODE*n=(NODE*)rb_newobj();42344235n->flags|=T_NODE;4236nd_set_type(n,type);4237nd_set_line(n,ruby_sourceline);4238n->nd_file=ruby_sourcefile;42394240n->u1.node=a0;4241n->u2.node=a1;4242n->u3.node=a2;42434244returnn;4245}
(parse.y)
We’veseenrb_newobj()intheChapter5:Garbagecollection.ItisthefunctiontogetavacantRVALUE.ByattachingtheT_NODEstruct-typeflagtoit,theinitializationasaVALUEwillcomplete.Ofcourse,it’spossiblethatsomevaluesthatarenotoftypeNODE*arepassedforu1u2u3,butreceivedasNODE*forthetimebeing.Sincethesyntaxtreesofrubydoesnotcontaindoubleandsuch,ifthevaluesarereceivedaspointers,itwillneverbetoosmallinsize.
Fortherestpart,youcanforgetaboutthedetailsyou’velearnedsofar,andassumeNODEis
flags
nodetype
nd_line
nd_file
u1
u2
u3
astructtypethathastheabovesevenmembers.
SyntaxTreeConstruction
Theroleoftheparseristoconvertthesourcecodethatisabytesequencetoasyntaxtree.Althoughthegrammarpassed,itdoesnotfinishevenhalfofthetask,sowehavetoassemblenodesandcreateatree.Inthissection,we’lllookattheconstructionprocessofthatsyntaxtree.
YYSTYPE
Essentiallythischapterisaboutactions,thusYYSTYPEwhichisthetypeof$$or$1becomesimportant.Let’slookatthe%unionofrubyfirst.
▼%uniondeclaration
170%union{171NODE*node;172IDid;173intnum;174structRVarmap*vars;175}
(parse.y)
structRVarmapisastructusedbytheevaluatorandholdsablocklocalvariable.Youcantelltherest.Themostusedoneisofcoursenode.
LandscapewithSyntaxTreesImentionedthatlookingatthefactfirstisatheoryofcodereading.Sincewhatwewanttoknowthistimeishowthegeneratedsyntaxtreeis,weshouldstartwithlookingattheanswer(thesyntaxtree).
It’salsoniceusingdebuggerstoobserveeverytime,butyoucanvisualizethesyntaxtreemorehandilybyusingthetoolnodedumpcontainedintheattachedCD-ROM,ThistoolisoriginallytheNodeDumpmadebyPragmaticProgrammersandremodeledforthisbook.Theoriginalversionshowsquiteexplanatoryoutput,butthisremodeledversiondeeplyanddirectlydisplaystheappearanceofthesyntaxtree.
Forexample,inordertodumpthesimpleexpressionm(a),youcandoasfollows:
%ruby-rnodedump-e'm(a)'NODE_NEWLINEnd_file="-e"nd_nth=1nd_next:NODE_FCALLnd_mid=9617(m)nd_args:
NODE_ARRAYnd_alen=1nd_head:NODE_VCALLnd_mid=9625(a)nd_next=(null)
The-roptionisusedtospecifythelibrarytobeload,andthe-eisusedtopassaprogram.Then,thesyntaxtreeexpressionoftheprogramwillbedumped.
I’llbrieflyexplainabouthowtoseethecontent.NODE_NEWLINEandNODE_FCALLandsucharethenodetypes.Whatarewrittenatthesameindentlevelofeachnodearethecontentsofitsnodemembers.Forexample,therootisNODE_NEWLINE,andithasthethreemembers:nd_filend_nthnd_next.nd_filepointstothe"-e"stringofC,andng_nthpointstothe1integerofC,andnd_nextholdsthenextnodeNODE_CALL.Butsincetheseexplanationintextareprobablynotintuitive,IrecommendyoutoalsocheckFig.4atthesametime.
Fig.4:SyntaxTree
I’llexplainthemeaningofeachnode.NODE_CALLisaFunctionCALL.NODE_ARRAYisasitsnamesuggeststhenodeofarray,andhereitexpressesthelistofarguments.NODE_VCALLisaVariableorCALL,areferencetoundefinedlocalvariablewillbecomethis.
Then,whatisNODE_NEWLINE?Thisisthenodetojointhenameofthecurrentlyexecutedfileandthelinenumberatruntimeandissetforeachstmt.Therefore,whenonlythinkingaboutthemeaningoftheexecution,thisnodecanbeignored.Whenyourequirenodedump-shortinsteadofnodedump,distractionslikeNODE_NEWLINEareleftoutinthefirstplace.Sinceitiseasiertoseeifitissimple,nodedump-shortwillbeusedlateronexceptforwhenparticularlywritten.
Now,we’lllookatthethreetypeofcomposingelementsinordertograsphowthewholesyntaxtreeis.Thefirstoneistheleavesofasyntaxtree.Next,we’lllookatexpressionsthatarecombinationsofthatleaves,thismeanstheyarebranchesofasyntaxtree.Thelastoneisthelisttolistupthestatementsthatisthetrunkofasyntaxtreeinotherwords.
LeafFirst,let’sstartwiththeedgesthataretheleavesofthesyntaxtree.Literalsandvariablereferencesandsoon,amongtherules,theyarewhatbelongtoprimaryandareparticularlysimpleevenamongtheprimaryrules.
%ruby-rnodedump-short-e'1'NODE_LITnd_lit=1:Fixnum
1asanumericvalue.There’snotanytwist.However,noticethatwhatisstoredinthenodeisnot1ofCbut1ofRuby(1ofFixnum).Thisisbecause…
%ruby-rnodedump-short-e':sym'NODE_LITnd_lit=9617:Symbol
Thisway,SymbolisrepresentedbythesameNODE_LITwhenitbecomesasyntaxtree.Astheaboveexample,VALUEisalwaysstoredinnd_litsoitcanbehandledcompletelyinthesamewaywhether
itisaSymboloraFixnumwhenexecuting.Inthisway,allweneedtodowhendealingwithitareretrievingthevalueinnd_litandreturningit.Sincewecreateasyntaxtreeinordertoexecuteit,designingitsothatitbecomesconvenientwhenexecutingistherightthingtodo.
%ruby-rnodedump-short-e'"a"'NODE_STRnd_lit="a":String
Astring.ThisisalsoaRubystring.Stringliteralsarecopiedwhenactuallyused.
%ruby-rnodedump-e'[0,1]'NODE_NEWLINEnd_file="-e"nd_nth=1nd_next:NODE_ARRAYnd_alen=2nd_head:NODE_LITnd_lit=0:Fixnumnd_next:NODE_ARRAYnd_alen=1nd_head:NODE_LITnd_lit=1:Fixnumnd_next=(null)
Array.Ican’tsaythisisaleaf,butlet’sallowthistobeherebecauseit’salsoaliteral.ItseemslikealistofNODE_ARRAYhungwitheachelementnode.ThereasonwhyonlyinthiscaseIdidn’tusenodedump-shortis…youwillunderstandafterfinishingtoread
thissection.
BranchNext,we’llfocuson“combinations”thatarebranches.ifwillbetakenasanexample.
if
Ifeellikeifisalwaysusedasanexample,that’sbecauseitsstructureissimpleandthere’snotanyreaderwhodon’tknowaboutif,soitisconvenientforwriters.
Anyway,thisisanexampleofif.Forexample,let’sconvertthiscodetoasyntaxtree.
▼TheSourceProgram
iftrue'trueexpr'else'falseexpr'end
▼Itssyntaxtreeexpression
NODE_IFnd_cond:NODE_TRUEnd_body:NODE_STRnd_lit="trueexpr":String
nd_else:NODE_STRnd_lit="falseexpr":String
Here,thepreviouslydescribednodedump-shortisused,soNODE_NEWLINEdisappeared.nd_condisthecondition,nd_bodyisthebodyofthetruecase,nd_elseisthebodyofthefalsecase.
Then,let’slookatthecodetobuildthis.
▼ifrule
1373|kIFexpr_valuethen1374compstmt1375if_tail1376kEND1377{1378$$=NEW_IF(cond($2),$4,$5);1379fixpos($$,$2);1380}
(parse.y)
ItseemsthatNEW_IF()isthemacrotocreateNODE_IF.Amongthevaluesofthesymbols,$2$4$5areused,thusthecorrespondencesbetweenthesymbolsoftheruleand$nare:
kIFexpr_valuethencompstmtif_tailkEND$1$2$3$4$5$6NEW_IF(expr_value,compstmt,if_tail)
thisway.Inotherwords,expr_valueistheconditionexpression,compstmt($4)isthecaseoftrue,if_tailisthecaseoffalse.
Ontheotherhand,themacrostocreatenodesareallnamedNEW_xxxx,andtheyaredefinednode.h.Let’slookatNEW_IF().
▼NEW_IF()
243#defineNEW_IF(c,t,e)rb_node_newnode(NODE_IF,c,t,e)
(node.h)
Asfortheparameters,itseemsthatcrepresentscondition,trepresentsthen,anderepresentselserespectively.Asdescribedattheprevioussection,theorderofmembersofanodeisnotsomeaningful,soyoudon’tneedtobecarefulaboutparameternamesinthiskindofplace.
And,thecode()whichprocessesthenodeoftheconditionexpressionintheactionisasemanticanalysisfunction.Thiswillbedescribedlater.
Additionally,fixpos()correctsthelinenumber.NODEisinitializedwiththefilenameandthelinenumberofthetimewhenitis“created”.However,forinstance,thecodeofifshouldalreadybeparsedbyendbythetimewhencreatingNODE_IF.Thus,thelinenumberwouldgowrongifitremainsuntouched.Therefore,itneedstobecorrectedbyfixpos().
fixpos(dest,src)
Thisway,thelinenumberofthenodedestissettotheoneofthe
nodesrc.Asforif,thelinenumberoftheconditionexpressionbecomesthelinenumberofthewholeifexpression.
elsif
Subsequently,let’slookattheruleofif_tail.
▼if_tail
1543if_tail:opt_else1544|kELSIFexpr_valuethen1545compstmt1546if_tail1547{1548$$=NEW_IF(cond($2),$4,$5);1549fixpos($$,$2);1550}
1553opt_else:none1554|kELSEcompstmt1555{1556$$=$2;1557}
(parse.y)
First,thisruleexpresses“alistendswithopt_elseafterzeroormorenumberofelsifclauses”.That’sbecause,if_tailappearsagainandagainwhileelsifcontinues,itdisappearswhenopt_elsecomesin.Wecanunderstandthisbyextractingarbitrarytimes.
if_tail:kELSIF....if_tailif_tail:kELSIF....kELSIF....if_tailif_tail:kELSIF....kELSIF....kELSIF....if_tailif_tail:kELSIF....kELSIF....kELSIF....opt_else
if_tail:kELSIF....kELSIF....kELSIF....kELSEcompstmt
Next,let’sfocusontheactions,surprisingly,elsifusesthesameNEW_IF()asif.Itmeans,thebelowtwoprogramswilllosethedifferenceaftertheybecomesyntaxtrees.
ifcond1ifcond1body1body1elsifcond2elsebody2ifcond2elsifcond3body2body3elseelseifcond3body4body3endelsebody4endendend
Cometothinkofit,inClanguageandsuch,there’snodistinctionbetweenthetwoalsoatthesyntaxlevel.Thusthismightbeamatterofcourse.Alternatively,theconditionaloperator(a?b:c)becomesindistinguishablefromifstatementaftertheybecomesyntaxtrees.
Theprecedenceswasverymeaningfulwhenitwasinthecontextofgrammar,buttheybecomeunnecessaryanymorebecausethestructureofasyntaxtreecontainsthatinformation.And,thedifferenceinappearancesuchasifandtheconditionaloperatorbecomecompletelymeaningless,itsmeaning(itsbehavior)onlymatters.Therefore,there’sperfectlynoproblemififandthe
conditionaloperatorarethesameinitssyntaxtreeexpression.
I’llintroduceafewmoreexamples.addand&&becomethesame.orand||arealsoequaltoeachother.notand!,ifandmodifierif,andsoon.Thesepairsalsobecomeequaltoeachother.
LeftRecursiveandRightRecursiveBytheway,thesymbolofalistwasalwayswrittenattheleftsidewhenexpressingalistinChapter9:yacccrashcourse.However,haveyounoticeditbecomesoppositeinif_tail?I’llshowonlythecrucialpartagain.
if_tail:opt_else|kELSIF...if_tail
Surely,itisoppositeofthepreviousexamples.if_tailwhichisthesymbolofalistisattherightside.
Infact,there’sanotherestablishedwayofexpressinglists,
list:END_ITEM|ITEMlist
whenyouwriteinthisway,itbecomesthelistthatcontainscontinuouszeroormorenumberofITEMandendswithEND_ITEM.
Asanexpressionofalist,whicheverisuseditdoesnotcreateasomuchdifference,butthewaythattheactionsareexecutedisfatallydifferent.Withtheformthatlistiswrittenattheright,theactions
aresequentiallyexecutedfromthelastITEM.We’vealreadylearnedaboutthebehaviorofthestackofwhenlistisattheleft,solet’strythecasethatlistisattheright.Theinputis4ITEMsandEND_ITEM.
emptyatfirstITEM shiftITEMITEMITEM shiftITEMITEMITEMITEM shiftITEMITEMITEMITEMITEM shiftITEMITEMITEMITEMITEMEND_ITEM shiftEND_ITEMITEMITEMITEMITEMlist reduceEND_ITEMtolistITEMITEMITEMlist reduceITEMlisttolistITEMITEMlist reduceITEMlisttolistITEMlist reduceITEMlisttolistlist reduceITEMlisttolist
accept.
Whenlistwasattheleft,shiftsandreductionsweredoneinturns.Thistime,asyousee,therearecontinuousshiftsandcontinuousreductions.
Thereasonwhyif_tailplaces“listattheright”istocreateasyntaxtreefromthebottomup.Whencreatingfromthebottomup,thenodeofifwillbeleftinhandintheend.Butifdefiningif_tailbyplacing“listattheleft”,inordertoeventuallyleavethenodeofifinhand,itneedstotraversealllinksoftheelsifandeverytimeelsifisfoundaddittotheend.Thisiscumbersome.
And,slow.Thus,if_tailisconstructedinthe“listattheright”manner.
Finally,themeaningoftheheadlineis,ingrammarterms,“theleftislist”iscalledleft-recursive,“therightislist”iscalledright-recursive.Thesetermsareusedmainlywhenreadingpapersaboutprocessinggrammarsorwritingabookofyacc.
TrunkLeaf,branch,andfinally,it’strunk.Let’slookathowthelistofstatementsarejoined.
▼TheSourceProgram
789
Thedumpofthecorrespondingsyntaxtreeisshownbelow.Thisisnotnodedump-shortbutintheperfectform.
▼ItsSyntaxTree
NODE_BLOCKnd_head:NODE_NEWLINEnd_file="multistmt"nd_nth=1nd_next:NODE_LITnd_lit=7:Fixnumnd_next:
NODE_BLOCKnd_head:NODE_NEWLINEnd_file="multistmt"nd_nth=2nd_next:NODE_LITnd_lit=8:Fixnumnd_next:NODE_BLOCKnd_head:NODE_NEWLINEnd_file="multistmt"nd_nth=3nd_next:NODE_LITnd_lit=9:Fixnumnd_next=(null)
WecanseethelistofNODE_BLOCKiscreatedandNODE_NEWLINEareattachedasheaders.(Fig.5)
Fig.5:NODE_BLOCKandNODE_NEWLINE
Itmeans,foreachstatement(stmt)NODE_NEWLINEisattached,andwhentheyaremultiple,itwillbealistofNODE_BLOCK.Let’salsoseethecode.
▼stmts
354stmts:none355|stmt356{357$$=newline_node($1);358}359|stmtstermsstmt360{361$$=block_append($1,newline_node($3));362}
(parse.y)
newline_node()capsNODE_NEWLINE,block_append()appendsittothelist.It’sstraightforward.Let’slookatthecontentonlyoftheblock_append().
block_append()
Itthisfunction,theerrorchecksareintheverymiddleandobstructive.ThusI’llshowthecodewithoutthatpart.
▼block_append()(omitted)
4285staticNODE*4286block_append(head,tail)4287NODE*head,*tail;4288{4289NODE*end;42904291if(tail==0)returnhead;4292if(head==0)returntail;42934294if(nd_type(head)!=NODE_BLOCK){4295end=NEW_BLOCK(head);4296end->nd_end=end;/*(A-1)*/4297fixpos(end,head);4298head=end;4299}4300else{4301end=head->nd_end;/*(A-2)*/4302}
/*……omitted……*/
4325if(nd_type(tail)!=NODE_BLOCK){4326tail=NEW_BLOCK(tail);4327tail->nd_end=tail;4328}4329end->nd_next=tail;
4330head->nd_end=tail->nd_end;/*(A-3)*/4331returnhead;4332}
(parse.y)
Accordingtotheprevioussyntaxtreedump,NEW_BLOCKwasalinkedlistusesnd_next.Beingawareofitwhilereading,itcanberead“ifeitherheadortailisnotNODE_BLOCK,wrapitwithNODE_BLOCKandjointhelistseachother.”
Additionally,on(A-1~3),thend_endoftheNODE_BLOCKoftheheadofthelistalwayspointstotheNODE_BLOCKofthetailofthelist.Thisisprobablybecauseinthiswaywedon’thavetotraverseallelementswhenaddinganelementtothetail(Fig.6).Converselyspeaking,whenyouneedtoaddelementslater,NODE_BLOCKissuitable.
Fig.6:Appendingiseasy.
Thetwotypesoflists
Now,I’veexplainedtheoutlinesofar.BecausethestructureofsyntaxtreewillalsoappearinPart3inlargeamounts,wewon’tgofurtheraslongasweareinPart2.Butbeforeending,there’sonemorethingI’dliketotalkabout.Itisaboutthetwogeneral-purposelists.
Thetwogeneral-purposelistsmeanBLOCKandLIST.BLOCKis,aspreviouslydescribed,alinkedlistofNODE_BLOCKtojointhestatements.LISTis,althoughitiscalledLIST,alistofNODE_ARRAY.Thisiswhatisusedforarrayliterals.LISTisusedtostoretheargumentsofamethodorthelistofmultipleassignments.
Asforthedifferencebetweenthetwolists,lookingattheusageofthenodesishelpfultounderstand.
NODE_BLOCK nd_head holdinganelementnd_end pointingtotheNODE_BLOCKoftheendofthelistnd_next pointingtothenextNODE_BLOCK
NODE_ARRAY nd_head holdinganelementnd_alen thelengthofthelistthatfollowsthisnodend_next pointingtothenextNODE_ARRAY
Theusagediffersonlyinthesecondelementsthatarend_endandnd_alen.Andthisisexactlythesignificanceoftheexistenceofeachtypeofthetwonodes.SinceitssizecanbestoredinNODE_ARRAY,weuseanARRAYlistwhenthesizeofthelistwillfrequentlyberequired.Otherwise,weuseaBLOCKlistthatisveryfasttojoin.Idon’tdescribethistopicindetailsbecausethecodesthatusethem
isnecessarytounderstandthesignificancebutnotshownhere,butwhenthecodesappearinPart3,I’dlikeyoutorecallthisandthink“Oh,thisusesthelength”.
SemanticAnalysis
AsIbrieflymentionedatthebeginningofPart2,therearetwotypesofanalysisthatareappearanceanalysisandsemanticanalysis.Theappearanceanalysisismostlydonebyyacc,therestisdoingthesemanticanalysisinsideactions.
ErrorsinsideactionsWhatdoesthesemanticanalysispreciselymean?Forexample,therearetypechecksinalanguagethathastypes.Alternatively,checkifvariableswiththesamenamearenotdefinedmultipletimes,andcheckifvariablesarenotusedbeforetheirdefinitions,andcheckiftheprocedurebeingusedisdefined,andcheckifreturnisnotusedoutsideofprocedures,andsoon.Thesearepartofthesemanticanalysis.
Whatkindofsemanticanalysisisdoneinthecurrentruby?Sincetheerrorchecksoccupiesalmostallofsemanticanalysisinruby,searchingtheplaceswheregeneratingerrorsseemsagoodway.Inaparserofyacc,yyerror()issupposedtobecalledwhenanerror
occurs.Converselyspeaking,there’sanerrorwhereyyerror()exists.So,Imadealistoftheplaceswherecallingyyerror()insidetheactions.
anexpressionnothavingitsvalue(voidvalueexpression)ataplacewhereavalueisrequiredanaliasof$nBEGINinsideofamethodENDinsideofamethodreturnoutsideofmethodsalocalvariableataplacewhereconstantisrequiredaclassstatementinsideofamethodaninvalidparametervariable($gvarandCONSTandsuch)parameterswiththesamenameappeartwiceaninvalidreceiverofasingletonmethod(def().methodandsuch)asingletonmethoddefinitiononliteralsanoddnumberofalistforhashliteralsanassignmenttoself/nil/true/false/__FILE__/__LINE__aconstantassignmentinsideofamethodamultipleassignmentinsideofaconditionalexpression
Thesecheckscanroughlybecategorizedbyeachpurposeasfollows:
forthebettererrormessageinordernottomaketheruletoocomplextheothers(puresemanticanalysis)
Forexample,“returnoutsideofamethod”isacheckinordernottomaketheruletoocomplex.Sincethiserrorisaproblemofthestructure,itcanbedealtwithbygrammar.Forexample,it’spossiblebydefiningtherulesseparatelyforbothinsideandoutsideofmethodsandmakingthelistofallwhatareallowedandwhatarenotallowedrespectively.Butthisisinanywaycumbersomeandrejectingitinanactionisfarmoreconcise.
And,“anassignmenttoself”seemsacheckforthebettererrormessage.Incomparisonto“returnoutsideofmethods”,rejectingitbygrammarismucheasier,butifitisrejectedbytheparser,theoutputwouldbejust"parseerror".Comparingtoit,thecurrent
%ruby-e'self=1'-e:1:Can'tchangethevalueofselfself=1^
thiserrorismuchmorefriendly.
Ofcourse,wecannotalwayssaythatanarbitraryruleisexactly“forthispurpose”.Forexample,asfor“returnoutsideofmethods”,thiscanalsobeconsideredthatthisisacheck“forthebettererrormessage”.Thepurposesareoverlappingeachother.
Now,theproblemis“apuresemanticanalysis”,inRubytherearefewthingsbelongtothiscategory.Inthecaseofatypedlanguage,thetypeanalysisisabigevent,butbecausevariablesarenottypedinRuby,itismeaningless.Whatisstandingoutinsteadisthe
cheekofanexpressionthathasitsvalue.
Toput“havingitsvalue”precisely,itis“youcanobtainavalueasaresultofevaluatingit”.returnandbreakdonothavevaluesbythemselves.Ofcourse,avalueispassedtotheplacewherereturnto,butnotanyvaluesareleftattheplacewherereturniswritten.Therefore,forexample,thenextexpressionisodd,
i=return(1)
Sincethiskindofexpressionsareclearlyduetomisunderstandingorsimplemistakes,it’sbettertorejectwhencompiling.Next,we’lllookatvalue_exprwhichisoneofthefunctionstocheckifittakesavalue.
value_expr()
value_expr()isthefunctiontocheckifitisanexprthathasavalue.
▼value_expr()
4754staticint4755value_expr(node)4756NODE*node;4757{4758while(node){4759switch(nd_type(node)){4760caseNODE_CLASS:4761caseNODE_MODULE:4762caseNODE_DEFN:4763caseNODE_DEFS:4764rb_warning("voidvalueexpression");4765returnQfalse;
47664767caseNODE_RETURN:4768caseNODE_BREAK:4769caseNODE_NEXT:4770caseNODE_REDO:4771caseNODE_RETRY:4772yyerror("voidvalueexpression");4773/*or"controlneverreach"?*/4774returnQfalse;47754776caseNODE_BLOCK:4777while(node->nd_next){4778node=node->nd_next;4779}4780node=node->nd_head;4781break;47824783caseNODE_BEGIN:4784node=node->nd_body;4785break;47864787caseNODE_IF:4788if(!value_expr(node->nd_body))returnQfalse;4789node=node->nd_else;4790break;47914792caseNODE_AND:4793caseNODE_OR:4794node=node->nd_2nd;4795break;47964797caseNODE_NEWLINE:4798node=node->nd_next;4799break;48004801default:4802returnQtrue;4803}4804}48054806returnQtrue;4807}
(parse.y)
AlgorithmSummary:Itsequentiallychecksthenodesofthetree,ifithits“anexpressioncertainlynothavingitsvalue”,itmeansthetreedoesnothaveanyvalue.Thenitwarnsaboutthatbyusingrb_warning()andreturnQfalse.Ifitfinishestotraversetheentiretreewithouthittingany“anexpressionnothavingitsvalue”,itmeansthetreedoeshaveavalue.ThusitreturnsQtrue.
Here,noticethatitdoesnotalwaysneedtocheckthewholetree.Forexample,let’sassumevalue_expr()iscalledontheargumentofamethod.Here:
▼checkthevalueofargbyusingvalue_expr()
1055arg_value:arg1056{1057value_expr($1);1058$$=$1;1059}
(parse.y)
Insideofthisargument$1,therecanalsobeothernestingmethodcallsagain.But,theargumentoftheinsidemethodmusthavebeenalreadycheckedwithvalue_expr(),soyoudon’thavetocheckitagain.
Let’sthinkmoregenerally.Assumeanarbitrarygrammarelement
Aexists,andassumevalue_expr()iscalledagainstitsallcomposingelements,thenecessitytochecktheelementAagainwoulddisappear.
Then,forexample,howisif?Isitpossibletobehandledasifvalue_expr()hasalreadycalledforallelements?IfIputonlythebottomline,itisn’t.Thatisbecause,sinceifisastatement(whichdoesnotuseavalue),themainbodyshouldnothavetoreturnavalue.Forexample,inthenextcase:
defmethodiftruereturn1elsereturn2end5end
Thisifstatementdoesnotneedavalue.Butinthenextcase,itsvalueisnecessary.
defmethod(arg)tmp=ifargthen3else98endtmp*tmp/3.5end
So,inthiscase,theifstatementmustbecheckedwhencheckingtheentireassignmentexpression.Thiskindofthingsarelaidoutintheswitchstatementofvalue_expr().
RemovingTailRecursionBytheway,whenlookingoverthewholevalue_expr,wecanseethatthere’sthefollowingpatternappearsfrequently:
while(node){switch(nd_type(node)){caseNODE_XXXX:node=node->nd_xxxx;break;::}}
Thisexpressionwillalsocarrythesamemeaningafterbeingmodifiedtothebelow:
returnvalue_expr(node->nd_xxxx)
Acodelikethiswhichdoesarecursivecalljustbeforereturniscalledatailrecursion.Itisknownthatthiscangenerallybeconvertedtogoto.Thismethodisoftenusedwhenoptimizing.AsforScheme,itisdefinedinspecificationsthattailrecursionsmustberemovedbylanguageprocessors.ThisisbecauserecursionsareoftenusedinsteadofloopsinLisp-likelanguages.
However,becarefulthattailrecursionsareonlywhen“callingjustbeforereturn”.Forexample,takealookattheNODE_IFofvalue_expr(),
if(!value_expr(node->nd_body))returnQfalse;node=node->nd_else;break;
Asshownabove,thefirsttimeisarecursivecall.Rewritingthistotheformofusingreturn,
returnvalue_expr(node->nd_body)&&value_expr(node->nd_else);
Iftheleftvalue_expr()isfalse,therightvalue_expr()isalsoexecuted.Inthiscase,theleftvalue_expr()isnot“justbefore”return.Therefore,itisnotatailrecursion.Hence,itcan’tbeextractedtogoto.
ThewholepictureofthevaluecheckAsforvaluechecks,wewon’treadthefunctionsfurther.Youmightthinkit’stooearly,butalloftheotherfunctionsare,asthesameasvalue_expr(),step-by-stepone-by-oneonlytraversingandcheckingnodes,sotheyarecompletelynotinteresting.However,I’dliketocoverthewholepictureatleast,soIfinishthissectionbyjustshowingthecallgraphoftherelevantfunctions(Fig.7).
Fig.7:thecallgraphofthevaluecheckfunctions
LocalVariables
LocalVariableDefinitionsThevariabledefinitionsinRubyarereallyvarious.Asforconstantsandclassvariables,thesearedefinedonthefirstassignment.Asforinstancevariablesandglobalvariables,asallnamescanbeconsideredthattheyarealreadydefined,youcanreferthemwithoutassigningbeforehand(althoughitproduceswarnings).
Thedefinitionsoflocalvariablesareagaincompletelydifferentfromtheaboveall.Alocalvariableisdefinedwhenitsassignmentappearsontheprogram.Forexample,asfollows:
lvar=nilplvar#beingdefined
Inthiscase,astheassignmenttolvariswrittenatthefirstline,inthismomentlvarisdefined.Whenitisundefined,itendsupwitharuntimeexceptionNameErrorasfollows:
%rubylvar.rblvar.rb:1:undefinedlocalvariableormethod`lvar'for#<Object:0x40163a9c>(NameError)
Whydoesitsay"localvariableormethod"?Asformethods,theparenthesesoftheargumentscanbeomittedwhencalling,sowhenthere’snotanyarguments,itcan’tbedistinguishedfromlocalvariables.Toresolvethissituation,rubytriestocallitasamethodwhenitfindsanundefinedlocalvariable.Thenifthecorrespondingmethodisnotfound,itgeneratesanerrorsuchastheaboveone.
Bytheway,itisdefinedwhen“itappears”,thismeansitisdefinedeventhoughitwasnotassigned.Theinitialvalueofadefinedvariableisnil.
iffalselvar="thisassigmentwillneverbeexecuted"endplvar#showsnil
Moreover,sinceitisdefined“when”it“appears”,thedefinitionhastobebeforethereferenceinasymbolsequence.Forexample,inthenextcase,itisnotdefined.
plvar#notdefined!lvar=nil#althoughappearinghere...
Becarefulaboutthepointof“inthesymbolsequence”.Ithascompletelynothingtodowiththeorderofevaluations.Forexample,forthenextcode,naturallytheconditionexpressionisevaluatedfirst,butinthesymbolsequence,atthemomentwhenpappearstheassignmenttolvarhasnotappearedyet.Therefore,thisproducesNameError.
p(lvar)iflvar=true
Whatwe’velearnedbynowisthatthelocalvariablesareextremelyinfluencedbytheappearances.Whenasymbolsequencethatexpressesanassignmentappears,itwillbedefinedintheappearanceorder.Basedonthisinformation,wecaninferthatrubyseemstodefinelocalvariableswhileparsingbecausetheorderofthesymbolsequencedoesnotexistafterleavingtheparser.Andinfact,itistrue.Inruby,theparserdefineslocalvariables.
BlockLocalVariablesThelocalvariablesnewlydefinedinaniteratorblockarecalled
blocklocalvariablesordynamicvariables.Blocklocalvariablesare,inlanguagespecifications,identicaltolocalvariables.However,thesetwodifferintheirimplementations.We’lllookathowisthedifferencefromnowon.
ThedatastructureWe’llstartwiththelocalvariabletablestructlocal_vars.
▼structlocal_vars
5174staticstructlocal_vars{5175ID*tbl;/*thetableoflocalvariablenames*/5176intnofree;/*whetheritisusedfromoutside*/5177intcnt;/*thesizeofthetblarray*/5178intdlev;/*thenestinglevelofdyna_vars*/5179structRVarmap*dyna_vars;/*blocklocalvariablenames*/5180structlocal_vars*prev;5181}*lvtbl;
(parse.y)
Themembernameprevindicatesthatthestructlocal_varsisaopposite-directionlinkedlist.…Basedonthis,wecanexpectastack.Thesimultaneouslydeclaredglobalvariablelvtblpointstolocal_varsthatisthetopofthatstack.
And,structRVarmapisdefinedinenv.h,andisavailabletootherfilesandisalsousedbytheevaluator.Thisisusedtostoretheblocklocalvariables.
▼structRVarmap
52structRVarmap{53structRBasicsuper;54IDid;/*thevariablename*/55VALUEval;/*itsvalue*/56structRVarmap*next;57};
(env.h)
Sincethere’sstructRBasicatthetop,thisisaRubyobject.Itmeansitismanagedbythegarbagecollector.Andsinceitisjoinedbythenextmember,itisprobablyalinkedlist.
Basedontheobservationwe’vedoneandtheinformationthatwillbeexplained,Fig.8illustratestheimageofbothstructswhileexecutingtheparser.
Fig.8:Theimageoflocalvariabletablesatruntime
LocalVariableScopeWhenlookingoverthelistoffunctionnamesofparse.y,wecanfindfunctionssuchaslocal_push()local_pop()local_cnt()arelaidout.Inwhateverwayofthinking,theyappeartoberelatingtoalocalvariable.Moreover,becausethenamesarepushpop,itisclearlyastack.Sofirst,let’sfindouttheplaceswhereusingthesefunctions.
▼local_push()local_pop()usedexamples
1475|kDEFfname1476{1477$<id>$=cur_mid;1478cur_mid=$2;1479in_def++;1480local_push(0);1481}1482f_arglist1483bodystmt1484kEND1485{1486/*NOEX_PRIVATEfortoplevel*/1487$$=NEW_DEFN($2,$4,$5,class_nest?NOEX_PUBLIC:NOEX_PRIVATE);1488if(is_attrset_id($2))$$->nd_noex=NOEX_PUBLIC;1489fixpos($$,$4);1490local_pop();1491in_def--;1492cur_mid=$<id>3;1493}
(parse.y)
Atdef,Icouldfindtheplacewhereitisused.Itcanalsobefoundinclassdefinitionsandsingletonclassdefinitions,andmoduledefinitions.Inotherwords,itistheplacewherethescopeoflocalvariablesiscut.Moreover,asforhowtheyareused,itdoespushwherethemethoddefinitionstartsanddoespopwhenthedefinitionends.Thismeans,asweexpected,itisalmostcertainthatthefunctionsstartwithlocal_arerelatingtolocalvariables.Anditisalsorevealedthatthepartbetweenpushandpopisprobablyalocalvariablescope.
Moreover,Ialsosearchedlocal_cnt().
▼NEW_LASGN()
269#defineNEW_LASGN(v,val)rb_node_newnode(NODE_LASGN,v,val,local_cnt(v))
(node.h)
Thisisfoundinnode.h.Eventhoughtherearealsotheplaceswhereusinginparse.y,Ifounditintheotherfile.Thus,probablyI’mindesperation.
ThisNEW_LASGNis“newlocalassignment”.Thisshouldmeanthenodeofanassignmenttoalocalvariable.Andalsoconsideringtheplacewhereusingit,theparametervisapparentlythelocalvariablename.valisprobably(asyntaxtreethatrepresents).theright-handsidevalue
Basedontheaboveobservations,local_push()isatthebeginningofthelocalvariable,local_cnt()isusedtoaddalocalvariableifthere’salocalvariableassignmentinthehalfway,local_pop()isusedwhenendingthescope.Thisperfectscenariocomesout.(Fig.9)
Fig.9:theflowofthelocalvariablemanagement
Then,let’slookatthecontentofthefunction.
pushandpop▼local_push()
5183staticvoid5184local_push(top)5185inttop;5186{5187structlocal_vars*local;51885189local=ALLOC(structlocal_vars);5190local->prev=lvtbl;5191local->nofree=0;5192local->cnt=0;5193local->tbl=0;5194local->dlev=0;5195local->dyna_vars=ruby_dyna_vars;5196lvtbl=local;5197if(!top){5198/*preservethevariabletableofthepreviousscopeintoval*/5199rb_dvar_push(0,(VALUE)ruby_dyna_vars);5200ruby_dyna_vars->next=0;5201}5202}
(parse.y)
Asweexpected,itseemsthatstructlocal_varsisusedasastack.Also,wecanseelvtblispointingtothetopofthestack.Thelinesrelatestorb_dvar_push()willbereadlater,soitisleftuntouchedfornow.
Subsequently,we’lllookatlocal_pop()andlocal_tbl()atthesame
time.
▼local_tbllocal_pop
5218staticID*5219local_tbl()5220{5221lvtbl->nofree=1;5222returnlvtbl->tbl;5223}
5204staticvoid5205local_pop()5206{5207structlocal_vars*local=lvtbl->prev;52085209if(lvtbl->tbl){5210if(!lvtbl->nofree)free(lvtbl->tbl);5211elselvtbl->tbl[0]=lvtbl->cnt;5212}5213ruby_dyna_vars=lvtbl->dyna_vars;5214free(lvtbl);5215lvtbl=local;5216}
(parse.y)
I’dlikeyoutolookatlocal_tbl().Thisisthefunctiontoobtainthecurrentlocalvariabletable(lvtbl->tbl).Bycallingthis,thenofreeofthecurrenttablebecomestrue.Themeaningofnofreeseemsnaturally“Don’tfree()”.Inotherwords,thisislikereferencecounting,“thistablewillbeused,sopleasedon’tfree()”.Converselyspeaking,whenlocal_tbl()wasnotcalledwithatableevenonce,thattablewillbefreedatthemomentwhenbeingpoppedandbediscarded.Forexample,thissituationprobably
happenswhenamethodwithoutanylocalvariables.
However,the“necessarytable”heremeanslvtbl->tbl.Asyoucansee,lvtblitselfwillbefreedatthesamemomentwhenbeingpopped.Itmeansonlythegeneratedlvtbl->tblisusedintheevaluator.Then,thestructureoflvtbl->tblisbecomingimportant.Let’slookatthefunctionlocal_cnt()(whichseems)toaddvariableswhichisprobablyhelpfultounderstandhowthestructureis.
Andbeforethat,I’dlikeyoutorememberthatlvtbl->cntisstoredattheindex0ofthelvtbl->tbl.
AddingvariablesThefunction(whichseems)toaddalocalvariableislocal_cnt().
▼local_cnt()
5246staticint5247local_cnt(id)5248IDid;5249{5250intcnt,max;52515252if(id==0)returnlvtbl->cnt;52535254for(cnt=1,max=lvtbl->cnt+1;cnt<max;cnt++){5255if(lvtbl->tbl[cnt]==id)returncnt-1;5256}5257returnlocal_append(id);5258}
(parse.y)
Thisscanslvtbl->tblandsearcheswhatisequalstoid.Ifthesearchedoneisfound,itstraightforwardlyreturnscnt-1.Ifnothingisfound,itdoeslocal_append().local_append()mustbe,asitiscalledappend,theproceduretoappend.Inotherwords,local_cnt()checksifthevariablewasalreadyregistered,ifitwasnot,addsitbyusinglocal_append()andreturnsit.
Whatisthemeaningofthereturnvalueofthisfunction?lvtbl->tblseemsanarrayofthevariables,sothere’reone-to-onecorrespondencesbetweenthevariablenamesand“theirindex–1(cnt-1)”.(Fig.10)
Fig.10:Thecorrespondencesbetweenthevariablenamesandthereturnvalues
Moreover,thisreturnvalueiscalculatedsothatthestartpointbecomes0,thelocalvariablespaceisprobablyanarray.And,thisreturnstheindextoaccessthatarray.Ifitisnot,liketheinstancevariablesorconstants,(theIDof)thevariablenamecouldhavebeenusedasakeyinthefirstplace.
Youmightwanttoknowwhyitisavoidingindex0(theloopstartfromcnt=1)forsomereasons,itisprobablytostoreavalueat
local_pop().
Basedontheknowledgewe’velearned,wecanunderstandtheroleoflocal_append()withoutactuallylookingatthecontent.Itregistersalocalvariableandreturns“(theindexofthevariableinlvtbl->tbl)–1”.Itisshownbelow,let’smakesure.
▼local_append()
5225staticint5226local_append(id)5227IDid;5228{5229if(lvtbl->tbl==0){5230lvtbl->tbl=ALLOC_N(ID,4);5231lvtbl->tbl[0]=0;5232lvtbl->tbl[1]='_';5233lvtbl->tbl[2]='~';5234lvtbl->cnt=2;5235if(id=='_')return0;5236if(id=='~')return1;5237}5238else{5239REALLOC_N(lvtbl->tbl,ID,lvtbl->cnt+2);5240}52415242lvtbl->tbl[lvtbl->cnt+1]=id;5243returnlvtbl->cnt++;5244}
(parse.y)
Itseemsdefinitelytrue.lvtbl->tblisanarrayofthelocalvariablenames,anditsindex–1isthereturnvalue(localvariableID).
Notethatitincreaseslvtbl->cnt.Sincethecodetoincreaselvtbl-
>cntonlyexistshere,fromonlythiscodeitsmeaningcanbedecided.Then,whatisthemeaning?Itis,since“lvtbl->cntincreasesby1whenanewvariableisadded”,“lvtbl->cntholdsthenumberoflocalvariablesinthisscope”.
Finally,I’llexplainabouttbl[1]andtbl[2].These'_'and'~'are,asyoucanguessifyouarefamiliarwithRuby,thespecialvariablesnamed$_and$~.Thoughtheirappearancesareidenticaltoglobalvariables,theyareactuallylocalvariables.EvenIfyoudidn’texplicitlyuseit,whenthemethodssuchasKernel#getsarecalled,thesevariablesareimplicitlyassigned,thusit’snecessarythatthespacesarealwaysallocated.
SummaryoflocalvariablesSincethedescriptionoflocalvariableswerecomplexinvariousways,let’ssummarizeit.
First,Itseemsthelocalvariablesaredifferentfromtheothervariablesbecausetheyarenotmanagedwithst_table.Then,wherearetheystoredin?Itseemstheanswerisanarray.Moreover,itisstoredinadifferentarrayforeachscope.
Thearrayislvtbl->tbl,andtheindex0holdsthelvtbl->cntwhichissetatlocal_pop().Inotherwords,itholdsthenumberofthelocalvariables.Theindex1ormoreholdthelocalvariablenamesdefinedinthescope.Fig.11showsthefinalappearanceweexpect.
Fig.11:correspondencesbetweenlocalvariablenamesandthereturnvalues
BlockLocalVariablesTherestisdyna_varswhichisamemberofstructlocal_vars.Inotherwords,thisisabouttheblocklocalvariables.Ithoughtthattheremustbethefunctionstodosomethingwiththis,lookedoverthelistofthefunctionnames,andfoundthemasexpected.Therearethesuspiciousfunctionsnameddyna_push()dyna_pop()dyna_in_block().Moreover,hereistheplacewheretheseareused.
▼anexampleusingdyna_pushdyna_pop
1651brace_block:'{'1652{1653$<vars>$=dyna_push();1654}1655opt_block_var1656compstmt'}'1657{1658$$=NEW_ITER($3,0,$4);1659fixpos($$,$4);1660dyna_pop($<vars>2);1661}
(parse.y)
pushatthebeginningofaniteratorblock,popattheend.Thismust
betheprocessofblocklocalvariables.
Now,wearegoingtolookatthefunctions.
▼dyna_push()
5331staticstructRVarmap*5332dyna_push()5333{5334structRVarmap*vars=ruby_dyna_vars;53355336rb_dvar_push(0,0);5337lvtbl->dlev++;5338returnvars;5339}
(parse.y)
Increasinglvtbl->dlevseemsthemarkindicatestheexistenceoftheblocklocalvariablescope.Meanwhile,rb_dvar_push()is…
▼rb_dvar_push()
691void692rb_dvar_push(id,value)693IDid;694VALUEvalue;695{696ruby_dyna_vars=new_dvar(id,value,ruby_dyna_vars);697}
(eval.c)
ItcreatesastructRVarmapthathasthevariablenameidandthevaluevalasitsmembers,addsittothetopoftheglobalvariable
ruby_dyna_vars.Thisisagainandagaintheformofcons.Indyna_push(),ruby_dyan_varsisnotsetaside,itseemsitaddsdirectlytotheruby_dyna_varsofthepreviousscope.
Moreover,thevalueoftheidmemberoftheRVarmaptobeaddedhereis0.Althoughitwasnotseriouslydiscussedinthisbook,theIDofrubywillneverbe0whileitisnormallycreatedbyrb_intern().Thus,wecaninferthatthisRVarmap,asitislikeNULorNULL,probablyhasaroleassentinel.Ifwethinkbasedonthisassumption,wecandescribethereasonwhytheholderofavariable(RVarmap)isaddedeventhoughnotanyvariablesareadded.
Next,dyna_pop().
▼dyna_pop()
5341staticvoid5342dyna_pop(vars)5343structRVarmap*vars;5344{5345lvtbl->dlev--;5346ruby_dyna_vars=vars;5347}
(parse.y)
Byreducinglvtbl->dlev,itwritesdownthefactthattheblocklocalvariablescopeended.Itseemsthatsomethingisdonebyusingtheargument,let’sseethislateratonce.
Theplacetoaddablocklocalvariablehasnotappearedyet.
Somethinglikelocal_cnt()oflocalvariablesismissing.So,Ididplentyofgrepwithdvaranddyna,andthiscodewasfound.
▼assignable()(partial)
4599staticNODE*4600assignable(id,val)4601IDid;4602NODE*val;4603{:4634rb_dvar_push(id,Qnil);4635returnNEW_DASGN_CURR(id,val);
(parse.y)
assignable()isthefunctiontocreateanoderelatestoassignments,thiscitationisthefragmentofthatfunctiononlycontainstheparttodealwithblocklocalvariables.Itseemsthatitaddsanewvariable(toruby_dyna_vars)byusingrb_dvar_push()thatwe’vejustseen.
ruby_dyna_varsintheparserNow,takingtheaboveallintoconsiderations,let’simaginetheappearanceofruby_dyna_varsatthemomentwhenalocalvariablescopeisfinishedtobeparsed.
First,asIsaidpreviously,theRVarmapofid=0whichisaddedatthebeginningofablockscopeisasentinelwhichrepresentsabreakbetweentwoblockscopes.We’llcallthis“theheaderof
ruby_dyna_vars”.
Next,amongthepreviouslyshownactionsoftheruleoftheiteratorblock,I’dlikeyoutofocusonthispart:
$<vars>$=dyna_push();/*whatassignedinto$<vars>$is...*/::dyna_pop($<vars>2);/*……appearsat$<vars>2*/
dyna_push()returnstheruby_dyna_varsatthemoment.dyna_pop()puttheargumentintoruby_dyna_vars.Thismeansruby_dyna_varswouldbesavedandrestoredforeachtheblocklocalvariablescope.Therefore,whenparsingthefollowingprogram,
iter{a=niliter{b=niliter{c=nil#nestinglevel3}bb=nil#nestinglevel2iter{e=nil}}#nestinglevel1}
Fig.12showstheruby_dyna_varsinthissituation.
Fig.12:ruby_dyna_varswhenallscopesarefinishedtobeparsed
Thisstructureisfairlysmart.That’sbecausethevariablesofthehigherlevelscannaturallybeaccessedbytraversingoverallofthelistevenifthenestinglevelisdeep.Thiswayhasthesimplersearchingprocessthancreatingadifferenttableforeachlevel.
Plus,inthefigure,itlookslikebbishungatastrangeplace,butthisiscorrect.Whenavariableisfoundatthenestlevelwhichisdecreasedafterincreasedonce,itisattachedtothesubsequentofthelistoftheoriginallevel.Moreover,inthisway,thespecificationoflocalvariablethat“onlythevariableswhichalreadyexistinthesymbolsequencearedefined”isexpressedinanaturalform.
Andfinally,ateachcutoflocalvariablescopes(thisisnotofblocklocalvariablescopes),thislinkisentirelysavedorrestoredtolvtbl->dyna_vars.I’dlikeyoutogobackalittleandcheck
local_push()andlocal_pop().
Bytheway,althoughcreatingtheruby_dyna_varslistwasahugetask,itisbyitselfnotusedattheevaluator.Thislistisusedonlytochecktheexistenceofthevariablesandwillbegarbagecollectedatthesamemomentwhenparsingisfinished.Andafterenteringtheevaluator,anotherchainiscreatedagain.There’saquitedeepreasonforthis,…we’llseearoundthisonceagaininPart3.
TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera
CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License
RubyHackingGuide
Chapter13:Structureofthe
evaluator
Outline
InterfaceWearenotfamiliarwiththeword“Hyo-ka-ki”(evaluator).Literally,itmustbea“-ki”(device)to“hyo-ka”(evaluating).Then,whatis“hyo-ka”?
“Hyo-ka”isthedefinitivetranslationof“evaluate”.However,ifthepremiseisdescribingaboutprograminglanguages,itcanbeconsideredasanerrorintranslation.It’shardtoavoidthattheword“hyo-ka”givestheimpressionof“whetheritisgoodorbad”.
“Evaluate”inthecontextofprograminglanguageshasnothingtodowith“goodorbad”,anditsmeaningismorecloseto“speculating”or“executing”.Theoriginof“evaluate”isaLatinword“ex+value+ate”.IfItranslateitdirectly,itis“turnitintoavalue”.Thismaybethesimplestwaytounderstand:todeterminethevaluefromanexpressionexpressedintext.
Veryfranklyspeaking,thebottomlineisthatevaluatingisexecutingawrittenexpressionandgettingtheresultofit.Thenwhyisitnotcalledjust“execute”?It’sbecauseevaluatingisnotonlyexecuting.
Forexample,inanordinaryprogramminglanguage,whenwewrite“3”,itwillbedealtwithasaninteger3.Thissituationissometimesdescribedas“theresultofevaluating”3"is3".It’shardtosayanexpressionofaconstantisexecuted,butitiscertainlyanevaluation.It’sallrightifthereexistaprogramminglanguageinwhichtheletter“3”,whenitisevaluated,willbedealtwith(evaluated)asaninteger6.
I’llintroduceanotherexample.Whenanexpressionconsistsofmultipleconstants,sometimestheconstantsarecalculatedduringthecompilingprocess(constantfolding).Weusuallydon’tcallit“executing”becauseexecutingindicatestheprocessthatthecreatedbinaryisworking.However,nomatterwhenitiscalculatedyou’llgetthesameresultfromthesameprogram.
Inotherwords,“evaluating”isusuallyequalsto“executing”,butessentially“evaluating”isdifferentfrom“executing”.Fornow,onlythispointiswhatI’dlikeyoutoremember.
Thecharacteristicsofruby'sevaluator.Thebiggestcharacteristicofruby‘sevaluatoristhat,asthisisalsoofthewholeruby’sinterpretor,thedifferenceinexpressions
betweentheC-levelcode(extensionlibraries)andtheRuby-levelcodeissmall.Inordinaryprogramminglanguages,theamountofthefeaturesofitsinterpretorwecanusefromextensionlibrariesisusuallyverylimited,butthereareawfullyfewlimitsinruby.Definingclasses,definingmethodsandcallingamethodwithoutlimitation,thesecanbetakenforgranted.Wecanalsouseexceptionhandling,iterators.Furthermore,threads.
Butwehavetocompensatefortheconveniencessomewhere.Somecodesareweirdlyhardtoimplement,somecodeshavealotoverhead,andtherearealotofplacesimplementingthealmostsamethingtwicebothforCandRuby.
Additionally,rubyisadynamiclanguage,itmeansthatyoucanconstructandevaluateastringatruntime.Thatisevalwhichisafunction-likemethod.Asyouexpected,itisnamedafter“evaluate”.Byusingit,youcanevendosomethinglikethis:
lvar=1answer=eval("lvar+lvar")#theansweris2
TherearealsoModule#module_evalandObject#instance_eval,eachmethodbehavesslightlydifferently.I’lldescribeaboutthemindetailinChapter17:Dynamicevaluation.
eval.c
Theevaluatorisimplementedineval.c.However,thiseval.cisareallyhugefile:ithas9000lines,itssizeis200Kbytes,andthe
numberofthefunctionsinitis309.Itishardtofightagainst.Whenthesizebecomesthisamount,it’simpossibletofigureoutitsstructurebyjustlookingoverit.
Sohowcanwedo?First,thebiggerthefile,thelesspossibilityofitscontentnotseparatedatall.Inotherwords,theinsideofitmustbemodularizedintosmallportions.Then,howcanwefindthemodules?I’lllistupsomeways.
Thefirstwayistoprintthelistofthedefinedfunctionsandlookattheprefixesofthem.rb_dvar_,rb_mod_,rb_thread—thereareplentyoffunctionswiththeseprefixes.Eachprefixclearlyindicateagroupofthesametypeoffunctions.
Alternatively,aswecantellwhenlookingatthecodeoftheclasslibraries,Init_xxxx()isalwaysputattheendofablockinruby.Therefore,Init_xxxx()alsoindicatesabreakbetweenmodules.
Additionally,thenamesareobviouslyimportant,too.Sinceeval()andrb_eval()andeval_node()appearclosetoeachother,wenaturallythinkthereshouldbeadeeprelationshipamongthem.
Finally,inthesourcecodeofruby,thedefinitionsoftypesorvariablesandthedeclarationsofprototypesoftenindicateabreakbetweenmodules.
Beingawareofthesepointswhenlooking,itseemsthateval.ccanbemainlydividedintothesemoduleslistedbelow:
SafeLevel alreadyexplainedinChapter7:SecurityMethodEntryManipulations
findingordeletingsyntaxtreeswhichareactualmethodbodies
EvaluatorCore theheartoftheevaluatorthatrb_eval()isatitscenter.
Exception generationsofexceptionsandcreationsofbacktraces
Method theimplementationofmethodcall
Iterator theimplementationoffunctionsthatarerelatedtoblocks
Load loadingandevaluatingexternalfilesProc theimplementationofProcThread theimplementationofRubythreads
Amongthem,“Load”and“Thread”arethepartsthatessentiallyshouldnotbeineval.c.Theyareineval.cmerelybecauseoftherestrictionsofClanguage.Toputitmoreprecisely,theyneedthemacrossuchasPUSH_TAGdefinedineval.c.So,IdecidedtoexcludethetwotopicsfromPart3anddealwiththematPart4.And,it’sprobablyallrightifIdon’texplainthesafelevelherebecauseI’vealreadydoneinPart1.
Excludingtheabovethree,thesixitemsarelefttobedescribed.Thebelowtableshowsthecorrespondingchapterofeachofthem:
MethodEntryManipulations thenextchapter:ContextEvaluatorCore theentirepartofPart3Exception thischapterMethod Chapter15:MethodsIterator Chapter16:Blocks
Proc Chapter16:Blocks
Frommainbywayofruby_runtorb_eval
CallGraphThetruecoreoftheevaluatorisafunctioncalledrb_eval().Inthischapter,wewillfollowthepathfrommain()tothatrb_eval().Firstofall,hereisaroughcallgrapharoundrb_eval:
main....main.cruby_init....eval.cruby_prog_init....ruby.cruby_options....eval.cruby_process_options....ruby.cruby_run....eval.ceval_noderb_eval*ruby_stop
Iputthefilenamesontherightsidewhenmovingtoanotherfile.Gazingthiscarefully,thefirstthingwe’llnoticeisthatthefunctionsofeval.ccallthefunctionsofruby.cback.
Iwroteitas“callingback”becausemain.candruby.carerelativelyfortheimplementationofrubycommand.eval.cistheimplementationoftheevaluatoritselfwhichkeepsalittledistancefromrubycommand.Inotherwords,eval.cissupposedtobeusedbyruby.candcallingthefunctionsofruby.cfromeval.cmakeseval.clessindependent.
Then,whyisthisinthisway?It’smainlybecauseoftherestrictionsofClanguage.Becausethefunctionssuchasruby_prog_init()andruby_process_options()starttousetheAPIoftherubyworld,it’spossibleanexceptionoccurs.However,inordertostopanexceptionofRuby,it’snecessarytousethemacronamedPUSH_TAG()whichcanonlybeusedineval.c.Inotherwords,essentially,ruby_init()andruby_run()shouldhavebeendefinedinruby.c.
Then,whyisn’tPUSH_TAGanexternfunctionorsomethingwhichisavailabletootherfiles?Actually,PUSH_TAGcanonlybeusedasapairwithPOP_TAGasfollows:
PUSH_TAG();/*dolotsofthings*/POP_TAG();
Becauseofitsimplementation,thetwomacrosshouldbeputintothesamefunction.It’spossibletoimplementinawaytobeabletodividethemintodifferentfunctions,butnotinsuchwaybecauseit’sslower.
Thenextthingwenoticeis,thefactthatitsequentiallycallsthefunctionsnamedruby_xxxxfrommain()seemsverymeaningful.Sincetheyarereallyobviouslysymmetric,it’soddifthere’snotanyrelationship.
Actually,thesethreefunctionshavedeeprelationships.Simplyspeaking,allofthesethreeare“built-inRubyinterfaces”.Thatis,
theyareusedonlywhencreatingacommandwithbuilt-inrubyinterpretorandnotwhenwritingextensionlibraries.Sincerubycommanditselfcanbeconsideredasoneofprogramswithbuilt-inRubyintheory,tousetheseinterfacesisnatural.
Whatistheruby_prefix?Sofar,theallofruby’sfunctionsareprefixedwithrb_.Whyaretherethetwotypes:rb_andruby_?Iinvestigatedbutcouldnotunderstandthedifference,soIaskeddirectly.Theanswerwas,“ruby_isfortheauxiliaryfunctionsofrubycommandandrb_isfortheofficialinterfaces”
“Then,whyarethevariableslikeruby_scopeareruby_?”,Iaskedfurther.Itseemsthisisjustacoincidence.Thevariableslikeruby_scopeareoriginallynamedasthe_xxxx,butinthemiddleoftheversion1.3there’sachangetoaddprefixestoallinterfaces.Atthattimeruby_wasaddedtothe“may-be-internals-for-some-reasons”variables.
Thebottomlineisthatruby_isattachedtothingsthatsupportrubycommandortheinternalvariablesandrb_isattachedtotheofficialinterfacesofrubyinterpretor.
main()
First,straightforwardly,I’llstartwithmain().Itisnicethatthisisveryshort.
▼main()
36int37main(argc,argv,envp)38intargc;39char**argv,**envp;40{41#ifdefined(NT)42NtInitialize(&argc,&argv);43#endif44#ifdefined(__MACOS__)&&defined(__MWERKS__)45argc=ccommand(&argv);46#endif4748ruby_init();49ruby_options(argc,argv);50ruby_run();51return0;52}
(main.c)
#ifdefNTisobviouslytheNTofWindowsNT.ButsomehowNTisalsodefinedinWin9x.So,itmeansWin32environment.NtInitialize()initializesargcargvandthesocketsystem(WinSock)forWin32.Becausethisfunctionisonlydoingtheinitialization,it’snotinterestingandnotrelatedtothemaintopic.Thus,Iomitthis.
And,__MACOS__isnot“Ma-Ko-Su”butMacOS.Inthiscase,itmeansMacOS9andbefore,anditdoesnotincludeMacOSX.Eventhoughsuch#ifdefremains,asIwroteatthebeginningofthisbook,thecurrentversioncannotrunonMacOS9andbefore.It’sjustalegacyfromwhenrubywasabletorunonit.Therefore,Ialsoomitthiscode.
Bytheway,asitisprobablyknownbythereaderswhoarefamiliarwithClanguage,theidentifiersstartingwithanunderbararereservedforthesystemlibrariesorOS.However,althoughtheyarecalled“reserved”,usingitisalmostneverresultinanerror,butifusingalittleweirdccitcouldresultinanerror.Forexample,itistheccofHP-US.HP-USisanUNIXwhichHPiscreating.Ifthere’sanyopinionsuchasHP-UXisnotweird,Iwoulddenyitoutloud.
Anyway,conventionally,wedon’tdefinesuchidentifiersinuserapplications.
Now,I’llstarttobrieflyexplainaboutthebuilt-inRubyinterfaces.
ruby_init()
ruby_init()initializestheRubyinterpretor.SinceonlyasingleinterpretorofthecurrentRubycanexistinaprocess,itdoesnotneedneitherargumentsorareturnvalue.Thispointisgenerallyconsideredas“lackoffeatures”.
Whenthere’sonlyasingleinterpretor,morethananything,thingsaroundthedevelopmentenvironmentshouldbeespeciallytroublesome.Namely,theapplicationssuchasirb,RubyWin,andRDE.Althoughloadingarewrittenprogram,theclasseswhicharesupposedtobedeletedwouldremain.TocounterthiswiththereflectionAPIisnotimpossiblebutrequiresalotofefforts.
However,itseemsthatMr.Matsumoto(Matz)purposefullylimitsthenumberofinterpretorstoone.“it’simpossibletoinitialize
completely”seemsitsreason.Forinstance,“theloadedextensionlibrariescouldnotberemoved”istakenasanexample.
Thecodeofruby_init()isomittedbecauseit’sunnecessarytoread.
ruby_options()
Whattoparsecommand-lineoptionsfortheRubyinterpreterisruby_options().Ofcourse,dependingonthecommand,wedonothavetousethis.
Insidethisfunction,-r(loadalibrary)and-e(passaprogramfromcommand-line)areprocessed.Thisisalsowherethefilepassedasacommand-lineargumentisparsedasaRubyprogram.
rubycommandreadsthemainprogramfromafileifitwasgiven,otherwisefromstdin.Afterthat,usingrb_compile_string()orrb_compile_file()introducedatPart2,itcompilesthetextintoasyntaxtree.Theresultwillbesetintotheglobalvariableruby_eval_tree.
Ialsoomitthecodeofruby_options()becauseit’sjustdoingnecessarythingsonebyoneandnotinteresting.
ruby_run()
Finally,ruby_run()startstoevaluatethesyntaxtreewhichwassettoruby_eval_tree.Wealsodon’talwaysneedtocallthisfunction.Otherthanruby_run(),forinstance,wecanevaluateastringby
usingafunctionnamedrb_eval_string().
▼ruby_run()
1257void1258ruby_run()1259{1260intstate;1261staticintex;1262volatileNODE*tmp;12631264if(ruby_nerrs>0)exit(ruby_nerrs);12651266Init_stack((void*)&tmp);1267PUSH_TAG(PROT_NONE);1268PUSH_ITER(ITER_NOT);1269if((state=EXEC_TAG())==0){1270eval_node(ruby_top_self,ruby_eval_tree);1271}1272POP_ITER();1273POP_TAG();12741275if(state&&!ex)ex=state;1276ruby_stop(ex);1277}
(eval.c)
WecanseethemacrosPUSH_xxxx(),butwecanignorethemfornow.I’llexplainaboutaroundthemlaterwhenthetimecomes.Theimportantthinghereisonlyeval_node().Itscontentis:
▼eval_node()
1112staticVALUE1113eval_node(self,node)1114VALUEself;
1115NODE*node;1116{1117NODE*beg_tree=ruby_eval_tree_begin;11181119ruby_eval_tree_begin=0;1120if(beg_tree){1121rb_eval(self,beg_tree);1122}11231124if(!node)returnQnil;1125returnrb_eval(self,node);1126}
(eval.c)
Thiscallsrb_eval()onruby_eval_tree.Theruby_eval_tree_beginisstoringthestatementsregisteredbyBEGIN.But,thisisalsonotimportant.
And,ruby_stop()insideofruby_run()terminatesallthreadsandfinalizesallobjectsandchecksexceptionsand,intheend,callsexit().Thisisalsonotimportant,sowewon’tseethis.
rb_eval()
OutlineNow,rb_eval().Thisfunctionisexactlytherealcoreofruby.Onerb_eval()callprocessesasingleNODE,andthewholesyntaxtreewillbeprocessedbycallingrecursively.(Fig.1)
Fig.1:rb_eval
rb_evalis,asthesameasyylex(),madeofahugeswitchstatementandbranchingbyeachtypeofthenodes.First,let’slookattheoutline.
▼rb_eval()Outline
2221staticVALUE2222rb_eval(self,n)2223VALUEself;2224NODE*n;2225{2226NODE*nodesave=ruby_current_node;2227NODE*volatilenode=n;2228intstate;2229volatileVALUEresult=Qnil;22302231#defineRETURN(v)do{\2232result=(v);\2233gotofinish;\2234}while(0)22352236again:2237if(!node)RETURN(Qnil);22382239ruby_last_node=ruby_current_node=node;2240switch(nd_type(node)){
caseNODE_BLOCK:.....caseNODE_POSTEXE:.....caseNODE_BEGIN::(plentyofcasestatements):3415default:3416rb_bug("unknownnodetype%d",nd_type(node));3417}3418finish:3419CHECK_INTS;3420ruby_current_node=nodesave;3421returnresult;3422}
(eval.c)
Intheomittedpart,plentyofthecodestoprocessallnodesarelisted.Bybranchinglikethis,itprocesseseachnode.Whenthecodeisonlyafew,itwillbeprocessedinrb_eval().Butwhenitbecomingmany,itwillbeaseparatedfunction.Mostoffunctionsineval.carecreatedinthisway.
Whenreturningavaluefromrb_eval(),itusesthemacroRETURN()insteadofreturn,inordertoalwayspassthroughCHECK_INTS.Sincethismacroisrelatedtothreads,youcanignorethisuntilthechapteraboutit.
Andfinally,thelocalvariablesresultandnodearevolatileforGC.
NODE_IF
Now,takingtheifstatementasanexample,let’slookatthe
processoftherb_eval()evaluationconcretely.Fromhere,inthedescriptionofrb_eval(),
Thesourcecode(aRubyprogram)ItscorrespondingsyntaxtreeThepartialcodeofrb_eval()toprocessthenode.
thesethreewillbelistedatthebeginning.
▼sourceprogram
iftrue'trueexpr'else'falseexpr'end
▼itscorrespondingsyntaxtree(nodedump)
NODE_NEWLINEnd_file="if"nd_nth=1nd_next:NODE_IFnd_cond:NODE_TRUEnd_body:NODE_NEWLINEnd_file="if"nd_nth=2nd_next:NODE_STRnd_lit="trueexpr":Stringnd_else:NODE_NEWLINEnd_file="if"
nd_nth=4nd_next:NODE_STRnd_lit="falseexpr":String
Aswe’veseeninPart2,elsifandunlesscanbe,bycontrivingthewaystoassemble,bundledtoasingleNODE_IFtype,sowedon’thavetotreatthemspecially.
▼rb_eval()−NODE_IF
2324caseNODE_IF:2325if(trace_func){2326call_trace_func("line",node,self,2327ruby_frame->last_func,2328ruby_frame->last_class);2329}2330if(RTEST(rb_eval(self,node->nd_cond))){2331node=node->nd_body;2332}2333else{2334node=node->nd_else;2335}2336gotoagain;
(eval.c)
Onlythelastifstatementisimportant.Ifrewritingitwithoutanychangeinitsmeaning,itbecomesthis:
if(RTEST(rb_eval(self,node->nd_cond))){(A)RETURN(rb_eval(self,node->nd_body));(B)}else{RETURN(rb_eval(self,node->nd_else));(C)}
First,at(A),evaluating(thenodeof)theRuby’sconditionstatementandtestingitsvaluewithRTEST().I’vementionedthatRTEST()isamacrototestwhetherornotaVALUEistrueofRuby.Ifthatwastrue,evaluatingthethensideclauseat(B).Iffalse,evaluatingtheelsesideclauseat(C).
Inaddition,I’vementionedthatifstatementofRubyalsohasitsownvalue,soit’snecessarytoreturnavalue.Sincethevalueofanifisthevalueofeitherthethensideortheelsesidewhichistheoneexecuted,returningitbyusingthemacroRETURN().
Intheoriginallist,itdoesnotcallrb_eval()recursivelybutjustdoesgoto.Thisisthe"conversionfromtailrecursiontogoto"whichhasalsoappearedinthepreviouschapter“Syntaxtreeconstruction”.
NODE_NEW_LINE
SincetherewasNODE_NEWLINEatthenodeforaifstatement,let’slookatthecodeforit.
▼rb_eval()–NODE_NEWLINE
3404caseNODE_NEWLINE:3405ruby_sourcefile=node->nd_file;3406ruby_sourceline=node->nd_nth;3407if(trace_func){3408call_trace_func("line",node,self,3409ruby_frame->last_func,3410ruby_frame->last_class);3411}
3412node=node->nd_next;3413gotoagain;
(eval.c)
There’snothingparticularlydifficult.
call_trace_func()hasalreadyappearedatNODE_IF.Hereisasimpleexplanationofwhatkindofthingitis.ThisisafeaturetotraceaRubyprogramfromRubylevel.Thedebugger(debug.rb)andthetracer(tracer.rb)andtheprofiler(profile.rb)andirb(interactiverubycommand)andmoreareusingthisfeature.
Byusingthefunction-likemethodset_trace_funcyoucanregisteraProcobjecttotrace,andthatProcobjectisstoredintotrace_func.Iftrace_funcisnot0,itmeansnotQFalse,itwillbeconsideredasaProcobjectandexecuted(atcall_trace_func()).
Thiscall_trace_func()hasnothingtodowiththemaintopicandnotsointerestingaswell.Thereforeinthisbook,fromnowon,I’llcompletelyignoreit.Ifyouareinterestedinit,I’dlikeyoutochallengeafterfinishingtheChapter16:Blocks.
Pseudo-localVariablesNODE_IFandsuchareinteriornodesinasyntaxtree.Let’slookattheleaves,too.
▼rb_eval()Ppseudo-LocalVariableNodes
2312caseNODE_SELF:2313RETURN(self);23142315caseNODE_NIL:2316RETURN(Qnil);23172318caseNODE_TRUE:2319RETURN(Qtrue);23202321caseNODE_FALSE:2322RETURN(Qfalse);
(eval.c)
We’veseenselfastheargumentofrb_eval().I’dlikeyoutomakesureitbygoingbackalittle.Theothersareprobablynotneededtobeexplained.
JumpTagNext,I’dliketoexplainNODE_WHILEwhichiscorrespondingtowhile,buttoimplementbreakornextonlywithrecursivecallsofafunctionisdifficult.Sincerubyenablesthesesyntaxesbyusingwhatnamed“jumptag”,I’llstartwithdescribingitfirst.
Simplyput,“jumptag”isawrapperofsetjmp()andlongjump()whicharelibraryfunctionsofClanguage.Doyouknowaboutsetjmp()?Thisfunctionhasalreadyappearedatgc.c,butitisusedinveryabnormalwaythere.setjmp()isusuallyusedtojumpoverfunctions.I’llexplainbytakingthebelowcodeasanexample.Theentrypointisparent().
▼setjmp()andlongjmp()
jmp_bufbuf;
voidchild2(void){longjmp(buf,34);/*gobackstraighttoparentthereturnvalueofsetjmpbecomes34*/puts("Thismessagewillneverbeprinted.");}
voidchild1(void){child2();puts("Thismessagewillneverbeprinted.");}
voidparent(void){intresult;if((result=setjmp(buf))==0){/*normallyreturnedfromsetjmp*/child1();}else{/*returnedfromchild2vialongjmp*/printf("%d\n",result);/*shows34*/}}
First,whensetjmp()iscalledatparent(),theexecutingstateatthetimeissavedtotheargumentbuf.Toputitalittlemoredirectly,theaddressofthetopofthemachinestackandtheCPUregistersaresaved.Ifthereturnvalueofsetjmp()was0,itmeansitnormallyreturnedfromsetjmp(),thusyoucanwritethesubsequentcodeasusual.Thisistheifside.Here,itcallschild1().
Next,thecontrolmovestochild2()andcallslongjump,thenitcangobackstraighttotheplacewheretheargumentbufwassetjmped.
Sointhiscase,itgoesbacktothesetjmpatparent().Whencomingbackvialongjmp,thereturnvalueofsetjmpbecomesthevalueofthesecondargumentoflongjmp,sotheelsesideisexecuted.And,evenifwepass0tolongjmp,itwillbeforcedtobeanothervalue.Thusit’sfruitless.
Fig.2showsthestateofthemachinestack.Theordinaryfunctionsreturnonlyonceforeachcall.However,it’spossiblesetjmp()returnstwice.IsithelpfultograsptheconceptifIsaythatitissomethinglikefork()?
Fig.2:setjmp()longjmp()Image
Now,we’velearnedaboutsetjmp()asapreparation.Ineval.c,EXEC_TAGcorrespondstosetjmp()andJUMP_TAG()correspondstolongjmp()respectively.(Fig.3)
Fig.3:“tagjump”image
Takealookatthisimage,itseemsthatEXEC_TAG()doesnothaveanyarguments.Wherehasjmp_bufgone?Actually,inruby,jmp_bufiswrappedbythestructstructtag.Let’slookatit.
▼structtag
783structtag{784jmp_bufbuf;785structFRAME*frame;/*FRAMEwhenPUSH_TAG*/786structiter*iter;/*ITERwhenPUSH_TAG*/787IDtag;/*tagtype*/788VALUEretval;/*thereturnvalueofthisjump*/789structSCOPE*scope;/*SCOPEwhenPUSH_TAG*/790intdst;/*thedestinationID*/791structtag*prev;792};
(eval.c)
Becausethere’sthememberprev,wecaninferthatstructtagisprobablyastackstructureusingalinkedlist.Moreover,bylookingaroundit,wecanfindthemacrosPUSH_TAG()andPOP_TAG,thusitdefinitelyseemsastack.
▼PUSH_TAG()POP_TAG()
793staticstructtag*prot_tag;/*thepointertotheheadofthemachinestack*/
795#definePUSH_TAG(ptag)do{\796structtag_tag;\797_tag.retval=Qnil;\798_tag.frame=ruby_frame;\799_tag.iter=ruby_iter;\800_tag.prev=prot_tag;\801_tag.scope=ruby_scope;\802_tag.tag=ptag;\803_tag.dst=0;\804prot_tag=&_tag
818#definePOP_TAG()\819if(_tag.prev)\820_tag.prev->retval=_tag.retval;\821prot_tag=_tag.prev;\822}while(0)
(eval.c)
I’dlikeyoutobeflabbergastedherebecausetheactualtagisfullyallocatedatthemachinestackasalocalvariable.(Fig.4).Moreover,do~whileisdividedbetweenthetwomacros.ThismightbeoneofthemostawfulusagesoftheCpreprocessor.HereisthemacrosPUSH/POPcoupledandextractedtomakeiteasytoread.
do{
structtag_tag;_tag.prev=prot_tag;/*savetheprevioustag*/prot_tag=&_tag;/*pushanewtagonthestack*//*doseveralthings*/prot_tag=_tag.prev;/*restoretheprevioustag*/}while(0);
Thismethoddoesnothaveanyoverheadoffunctioncalls,anditscostofthememoryallocationisnexttonothing.Thistechniqueisonlypossiblebecausetherubyevaluatorismadeofrecursivecallsofrb_eval().
Fig.4:thetagstackisembeddedinthemachinestack
Becauseofthisimplementation,it’snecessarythatPUSH_TAGand
POP_TAGareinthesameonefunctionasapair.Plus,sinceit’snotsupposedtobecarelesslyusedattheoutsideoftheevaluator,wecan’tmakethemavailabletootherfiles.
Additionally,let’salsotakealookatEXEC_TAG()andJUMP_TAG().
▼EXEC_TAG()JUMP_TAG()
810#defineEXEC_TAG()setjmp(prot_tag->buf)
812#defineJUMP_TAG(st)do{\813ruby_frame=prot_tag->frame;\814ruby_iter=prot_tag->iter;\815longjmp(prot_tag->buf,(st));\816}while(0)
(eval.c)
Inthisway,setjmpandlongjmparewrappedbyEXEC_TAG()andJUMP_TAG()respectively.ThenameEXEC_TAG()canlooklikeawrapperoflongjmp()atfirstsight,butthisoneistoexecutesetjmp().
Basedonalloftheabove,I’llexplainthemechanismofwhile.First,whenstartingwhileitdoesEXEC_TAG()(setjmp).Afterthat,itexecutesthemainbodybycallingrb_eval()recursively.Ifthere’sbreakornext,itdoesJUMP_TAG()(longjmp).Then,itcangobacktothestartpointofthewhileloop.(Fig.5)
Fig.5:theimplementationofwhilebyusing“tagjump”
Thoughbreakwastakenasanexamplehere,whatcannotbeimplementedwithoutjumpingisnotonlybreak.Evenifwelimitthecasetowhile,therearenextandredo.Additionally,returnfromamethodandexceptionsalsoshouldhavetoclimboverthewallofrb_eval().Andsinceit’scumbersometouseadifferenttagstackforeachcase,wewantforonlyonestacktohandleallcasesinonewayoranother.
Whatweneedtomakeitpossibleisjustattachinginformationabout“whatthepurposeofthisjumpis”.Conveniently,thereturnvalueofsetjmp()couldbespecifiedastheargumentoflongjmp(),thuswecanusethis.Thetypesareexpressedbythefollowingflags:
▼tagtype
828#defineTAG_RETURN0x1/*return*/
829#defineTAG_BREAK0x2/*break*/830#defineTAG_NEXT0x3/*next*/831#defineTAG_RETRY0x4/*retry*/832#defineTAG_REDO0x5/*redo*/833#defineTAG_RAISE0x6/*generalexceptions*/834#defineTAG_THROW0x7/*throw(won'tbeexplainedinthisboook)*/835#defineTAG_FATAL0x8/*fatal:exceptionswhicharenotcatchable*/836#defineTAG_MASK0xf
(eval.c)
Themeaningsarewrittenaseachcomment.ThelastTAG_MASKisthebitmasktotakeouttheseflagsfromareturnvalueofsetjmp().Thisisbecausethereturnvalueofsetjmp()canalsoincludeinformationwhichisnotabouta“typeofjump”.
NODE_WHILE
Now,byexaminingthecodeofNODE_WHILE,let’schecktheactualusageoftags.
▼TheSourceProgram
whiletrue'true_expr'end
▼Itscorrespondingsyntaxtree(nodedump-short)
NODE_WHILEnd_state=1(while)nd_cond:NODE_TRUEnd_body:
NODE_STRnd_lit="true_expr":String
▼rb_eval–NODE_WHILE
2418caseNODE_WHILE:2419PUSH_TAG(PROT_NONE);2420result=Qnil;2421switch(state=EXEC_TAG()){2422case0:2423if(node->nd_state&&!RTEST(rb_eval(self,node->nd_cond)))2424gotowhile_out;2425do{2426while_redo:2427rb_eval(self,node->nd_body);2428while_next:2429;2430}while(RTEST(rb_eval(self,node->nd_cond)));2431break;24322433caseTAG_REDO:2434state=0;2435gotowhile_redo;2436caseTAG_NEXT:2437state=0;2438gotowhile_next;2439caseTAG_BREAK:2440state=0;2441result=prot_tag->retval;2442default:2443break;2444}2445while_out:2446POP_TAG();2447if(state)JUMP_TAG(state);2448RETURN(result);
(eval.c)
Theidiomwhichwillappearoverandoveragainappearedinthe
abovecode.
PUSH_TAG(PROT_NONE);switch(state=EXEC_TAG()){case0:/*processnormally*/break;caseTAG_a:state=0;/*clearstatebecausethejumpwaitedforcomes*//*dotheprocessofwhenjumpedwithTAG_a*/break;caseTAG_b:state=0;/*clearstatebecausethejumpwaitedforcomes*//*dotheprocessofwhenjumpedwithTAG_b*/break;defaultbreak;/*thisjumpisnotwaitedfor,then...*/}POP_TAG();if(state)JUMP_TAG(state);/*..jumpagainhere*/
First,asPUSH_TAG()andPOP_TAG()arethepreviouslydescribedmechanism,it’snecessarytobeusedalwaysasapair.Also,theyneedtobewrittenoutsideofEXEC_TAG().And,applyEXEC_TAG()tothejustpushedjmp_buf.Thismeansdoingsetjmp().Ifthereturnvalueis0,sinceitmeansimmediatelyreturningfromsetjmp(),itdoesthenormalprocessing(thisusuallycontainsrb_eval()).IfthereturnvalueofEXEC_TAG()isnot0,sinceitmeansreturningvialongjmp(),itfiltersonlytheownnecessaryjumpsbyusingcaseandletstherest(default)pass.
Itmightbehelpfultoseealsothecodeofthejumpingside.Thebelowcodeisthehandlerofthenodeofredo.
▼rb_eval()–NODE_REDO
2560caseNODE_REDO:2561CHECK_INTS;2562JUMP_TAG(TAG_REDO);2563break;
(eval.c)
AsaresultofjumpingviaJUMP_TAG(),itgoesbacktothelastEXEC_TAG().ThereturnvalueatthetimeistheargumentTAG_REDO.Beingawareofthis,I’dlikeyoutolookatthecodeofNODE_WHILEandcheckwhatrouteistaken.
Theidiomhasenoughexplained,nowI’llexplainaboutthecodeofNODE_WHILEalittlemoreindetail.Asmentioned,sincetheinsideofcase0:isthemainprocess,Iextractedonlythatpart.Additionally,Imovedsomelabelstoenhancereadability.
if(node->nd_state&&!RTEST(rb_eval(self,node->nd_cond)))gotowhile_out;do{rb_eval(self,node->nd_body);}while(RTEST(rb_eval(self,node->nd_cond)));while_out:
Therearethetwoplacescallingrb_eval()onnode->nd_statewhichcorrespondstotheconditionalstatement.Itseemsthatonlythefirsttestoftheconditionisseparated.Thisistodealwithbothdo~whileandwhileatonce.Whennode->nd_stateis0itisado~while,when1itisanordinarywhile.Therestmightbeunderstoodby
followingstep-by-step,Iwon’tparticularlyexplain.
Bytheway,Ifeellikeiteasilybecomesaninfiniteloopifthereisnextorredointheconditionstatement.Sinceitisofcourseexactlywhatthecodemeans,it’sthefaultofwhowroteit,butI’malittlecuriousaboutit.So,I’veactuallytriedit.
%ruby-e'whilenextdonilend'-e:1:voidvalueexpression
It’ssimplyrejectedatthetimeofparsing.It’ssafebutnotaninterestingresult.Whatproducesthiserrorisvalue_expr()ofparse.y.
Thevalueofanevaluationofwhilewhilehadnothaditsvalueforalongtime,butithasbeenabletoreturnavaluebyusingbreaksinceruby1.7.Thistime,let’sfocusontheflowofthevalueofanevaluation.Keepinginmindthatthevalueofthelocalvariableresultbecomesthereturnvalueofrb_eval(),I’dlikeyoutolookatthefollowingcode:
result=Qnil;switch(state=EXEC_TAG()){case0:/*themainprocess*/caseTAG_REDO:caseTAG_NEXT:/*eachjump*/
caseTAG_BREAK:state=0;
result=prot_tag->retval;(A)default:break;}RETURN(result);
Whatweshouldfocusonisonly(A).Thereturnvalueofthejumpseemstobepassedviaprot_tag->retvalwhichisastructtag.Hereisthepassingside:
▼rb_eval()–NODE_BREAK
2219#definereturn_value(v)prot_tag->retval=(v)
2539caseNODE_BREAK:2540if(node->nd_stts){2541return_value(avalue_to_svalue(rb_eval(self,node->nd_stts)));2542}2543else{2544return_value(Qnil);2545}2546JUMP_TAG(TAG_BREAK);2547break;
(eval.c)
Inthisway,byusingthemacroreturn_value(),itassignsthevaluetothestructofthetopofthetagstack.
Thebasicflowisthis,butinpracticetherecouldbeanotherEXEC_TAGbetweenEXEC_TAG()ofNODE_WHILEandJUMP_TAG()ofNODE_BREAK.Forexample,rescueofanexceptionhandlingcanexistbetweenthem.
whilecond#EXEC_TAG()forNODE_WHILEbegin#EXEC_TAG()againforrescuebreak1rescueendend
Therefore,it’shardtodeterminewhetherornotthestricttagofwhendoingJUMP_TAG()atNODE_BREAKistheonewhichwaspushedatNODE_WHILE.Inthiscase,becauseretvalispropagatedinPOP_TAG()asshownbelow,thereturnvaluecanbepassedtothenexttagwithoutparticularthought.
▼POP_TAG()
818#definePOP_TAG()\819if(_tag.prev)\820_tag.prev->retval=_tag.retval;\821prot_tag=_tag.prev;\822}while(0)
(eval.c)
ThiscanprobablybedepictedasFig.6.
Fig.6:Transferringthereturnvalue
Exception
Asthesecondexampleoftheusageof“tagjump”,we’lllookathowexceptionsaredealtwith.
raise
WhenIexplainedwhile,welookedatthesetjmp()sidefirst.This
time,we’lllookatthelongjmp()sidefirstforachange.It’srb_exc_raise()whichisthesubstanceofraise.
▼rb_exc_raise()
3645void3646rb_exc_raise(mesg)3647VALUEmesg;3648{3649rb_longjmp(TAG_RAISE,mesg);3650}
(eval.c)
mesgisanexceptionobject(aninstanceofExceptionoroneofitssubclass).NoticethatItseemstojumpwithTAG_RAISEthistime.Andthebelowcodeisverysimplifiedrb_longjmp().
▼rb_longjmp()(simplified)
staticvoidrb_longjmp(tag,mesg)inttag;VALUEmesg;{if(NIL_P(mesg))mesg=ruby_errinfo;set_backtrace(mesg,get_backtrace(mesg));ruby_errinfo=mesg;JUMP_TAG(tag);}
Well,thoughthiscanbeconsideredasamatterofcourse,thisisjusttojumpasusualbyusingJUMP_TAG().
Whatisruby_errinfo?Bydoinggrepafewtimes,Ifiguredoutthatthisvariableisthesubstanceoftheglobalvariable$!ofRuby.Sincethisvariableindicatestheexceptionwhichiscurrentlyoccurring,naturallyitssubstanceruby_errinfoshouldhavethesamemeaningaswell.
TheBigPicture▼thesourceprogram
beginraise('exceptionraised')rescue'rescueclause'ensure'ensureclause'end
▼thesyntaxtree(nodedump-short)
NODE_BEGINnd_body:NODE_ENSUREnd_head:NODE_RESCUEnd_head:NODE_FCALLnd_mid=3857(raise)nd_args:NODE_ARRAY[0:NODE_STRnd_lit="exceptionraised":String]nd_resq:NODE_RESBODY
nd_args=(null)nd_body:NODE_STRnd_lit="rescueclause":Stringnd_head=(null)nd_else=(null)nd_ensr:NODE_STRnd_lit="ensureclause":String
Astherightorderofrescueandensureisdecidedatparserlevel,therightorderisstrictlydecidedatsyntaxtreeaswell.NODE_ENSUREisalwaysatthe“top”,NODE_RESCUEcomesnext,themainbody(whereraiseexist)isthelast.SinceNODE_BEGINisanodetodonothing,youcanconsiderNODE_ENSUREisvirtuallyonthetop.
Thismeans,sinceNODE_ENSUREandNODE_RESCUEareabovethemainbodywhichwewanttoprotect,wecanstopraisebymerelydoingEXEC_TAG().Orrather,thetwonodesareputaboveinsyntaxtreeforthispurpose,isprobablymoreaccuratetosay.
ensure
WearegoingtolookatthehandlerofNODE_ENSUREwhichisthenodeofensure.
▼rb_eval()–NODE_ENSURE
2634caseNODE_ENSURE:2635PUSH_TAG(PROT_NONE);2636if((state=EXEC_TAG())==0){2637result=rb_eval(self,node->nd_head);(A-1)2638}
2639POP_TAG();2640if(node->nd_ensr){2641VALUEretval=prot_tag->retval;(B-1)2642VALUEerrinfo=ruby_errinfo;26432644rb_eval(self,node->nd_ensr);(A-2)2645return_value(retval);(B-2)2646ruby_errinfo=errinfo;2647}2648if(state)JUMP_TAG(state);(B-3)2649break;
(eval.c)
Thisbranchusingifisanotheridiomtodealwithtag.ItinterruptsajumpbydoingEXEC_TAG()thenevaluatestheensureclause((node->nd_ensr).Asfortheflowoftheprocess,it’sprobablystraightforward.
Again,we’lltrytothinkaboutthevalueofanevaluation.Tocheckthespecificationfirst,
beginexpr0ensureexpr1end
fortheabovestatement,thevalueofthewholebeginwillbethevalueofexpr0regardlessofwhetherornotensureexists.Thisbehaviorisreflectedtothecode(A-1,2),sothevalueoftheevaluationofanensureclauseiscompletelydiscarded.
At(B-1,3),itdealswiththeevaluatedvalueofwhenajump
occurredatthemainbody.Imentionedthatthevalueofthiscaseisstoredinprot_tag->retval,soitsavesthevaluetoalocalvariabletopreventfrombeingcarelesslyoverwrittenduringtheexecutionoftheensureclause(B-1).Aftertheevaluationoftheensureclause,itrestoresthevaluebyusingreturn_value()(B-2).Whenanyjumphasnotoccurred,state==0inthiscase,prot_tag->retvalisnotusedinthefirstplace.
rescue
It’sbeenalittlewhile,I’llshowthesyntaxtreeofrescueagainjustincase.
▼SourceProgram
beginraise()rescueArgumentError,TypeError'errorraised'end
▼ItsSyntaxTree(nodedump-short)
NODE_BEGINnd_body:NODE_RESCUEnd_head:NODE_FCALLnd_mid=3857(raise)nd_args=(null)nd_resq:NODE_RESBODYnd_args:
NODE_ARRAY[0:NODE_CONSTnd_vid=4733(ArgumentError)1:NODE_CONSTnd_vid=4725(TypeError)]nd_body:NODE_STRnd_lit="errorraised":Stringnd_head=(null)nd_else=(null)
I’dlikeyoutomakesurethat(thesyntaxtreeof)thestatementtoberescueedis“under”NODE_RESCUE.
▼rb_eval()–NODE_RESCUE
2590caseNODE_RESCUE:2591retry_entry:2592{2593volatileVALUEe_info=ruby_errinfo;25942595PUSH_TAG(PROT_NONE);2596if((state=EXEC_TAG())==0){2597result=rb_eval(self,node->nd_head);/*evaluatethebody*/2598}2599POP_TAG();2600if(state==TAG_RAISE){/*anexceptionoccurredatthebody*/2601NODE*volatileresq=node->nd_resq;26022603while(resq){/*dealwiththerescueclauseonebyone*/2604ruby_current_node=resq;2605if(handle_rescue(self,resq)){/*Ifdealtwithbythisclause*/2606state=0;2607PUSH_TAG(PROT_NONE);2608if((state=EXEC_TAG())==0){2609result=rb_eval(self,resq->nd_body);2610}/*evaluatetherescueclause*/
2611POP_TAG();2612if(state==TAG_RETRY){/*Sinceretryoccurred,*/2613state=0;2614ruby_errinfo=Qnil;/*theexceptionisstopped*/2615gotoretry_entry;/*converttogoto*/2616}2617if(state!=TAG_RAISE){/*Alsobyrescueandsuch*/2618ruby_errinfo=e_info;/*theexceptionisstopped*/2619}2620break;2621}2622resq=resq->nd_head;/*moveontothenextrescueclause*/2623}2624}2625elseif(node->nd_else){/*whenthereisanelseclause,*/2626if(!state){/*evaluateitonlywhenanyexceptionhasnotoccurred.*/2627result=rb_eval(self,node->nd_else);2628}2629}2630if(state)JUMP_TAG(state);/*thejumpwasnotwaitedfor*/2631}2632break;
(eval.c)
Eventhoughthesizeisnotsmall,it’snotdifficultbecauseitonlysimplydealwiththenodesonebyone.Thisisthefirsttimehandle_rescue()appeared,butforsomereasonswecannotlookatthisfunctionnow.I’llexplainonlyitseffectshere.Itsprototypeisthis,
staticinthandle_rescue(VALUEself,NODE*resq)
anditdetermineswhetherthecurrentlyoccurringexception(ruby_errinfo)isasubclassoftheclassthatisexpressedbyresq(TypeError,forinstance).Thereasonwhypassingselfisthatit’s
necessarytocallrb_eval()insidethisfunctioninordertoevaluateresq.
TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera
CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License
RubyHackingGuide
Chapter14:Context
Therangecoveredbythischapterisreallybroad.Firstofall,I’lldescribeabouthowtheinternalstateoftheevaluatorisexpressed.Afterthat,asanactualexample,we’llreadhowthestateischangedonaclassdefinitionstatement.Subsequently,we’llexaminehowtheinternalstateinfluencesmethoddefinitionstatements.Lastly,we’llobservehowthebothstatementschangethebehaviorsofthevariabledefinitionsandthevariablereferences.
TheRubystack
ContextandStackWithanimageofatypicalprocedurallanguage,eachtimecallingaprocedure,theinformationwhichisnecessarytoexecutetheproceduresuchasthelocalvariablespaceandtheplacetoreturnisstoredinastruct(astackframe)anditispushedonthestack.Whenreturningfromaprocedure,thestructwhichisonthetopofthestackispoppedandthestateisreturnedtothepreviousmethod.TheexecutingimageofaCprogramwhichwasexplainedatChapter5:Garbagecollectionisaperfectexample.
Whattobecarefulabouthereis,whatischangingduringthe
executionisonlythestack,onthecontrary,theprogramremainsunchangedwhereveritis.Forexample,ifitis“areferencetothelocalvariablei”,there’sjustanorderof“givemeiofthecurrentframe”,itisnotwrittenas“givemeiofthatframe”.Inotherwords,“only”thestateofthestackinfluencestheconsequence.Thisiswhy,evenifaprocedureiscalledanytimeandanynumberoftimes,weonlyhavetowriteitscodeonce(Fig.1).
Fig.1:Whatischangingisonlythestack
TheexecutionofRubyisalsobasicallynothingbutchainedcallsofmethodswhichareprocedures,soessentiallyithasthesameimageasabove.Inotherwords,withthesamecode,thingsbeingaccessedsuchaslocalvariablescopeandtheblocklocalscopewillbechanging.Andthesekindofscopesareexpressedbystacks.
HoweverinRuby,forinstance,youcantemporarilygobacktothescopepreviouslyusedbyusingiteratorsorProc.Thiscannotbeimplementedwithjustsimplypushing/poppingastack.ThereforetheframesoftheRubystackwillbeintricatelyrearrangedduringexecution.AlthoughIcallit“stack”,itcouldbebettertoconsideritasalist.
Otherthanthemethodcall,thelocalvariablescopecanalsobechangedontheclassdefinitions.So,themethodcallsdoesnotmatchthetransitionsofthelocalvariablescope.Sincetherearealsoblocks,it’snecessarytohandlethemseparately.Forthesevariousreasons,surprisingly,therearesevenstacks.
StackPointer
StackFrameType Description
ruby_frame structFRAME therecordsofmethodcallsruby_scope structSCOPE thelocalvariablescoperuby_block structBLOCK theblockscope
ruby_iter structiter whetherornotthecurrentFRAMEisaniterator
ruby_class VALUE theclasstodefinemethodsonruby_cref NODE(NODE_CREF) theclassnestinginformation
ChasonlyonestackandRubyhassevenstacks,bysimplearithmetic,theexecutingimageofRubyisatleastseventimesmorecomplicatedthanC.Butitisactuallynotseventimesatall,it’satleasttwentytimesmorecomplicated.
First,I’llbrieflydescribeaboutthesestacksandtheirstackframestructs.Thedefinedfileiseithereval.corevn.h.Basicallythesestackframesaretouchedonlybyeval.c…iswhatitshouldbeifitwerepossible,butgc.cneedstoknowthestructtypeswhenmarking,sosomeofthemareexposedinenv.h.
Ofcourse,markingcouldbedoneintheotherfilebutgc.c,butitrequiresseparatedfunctionswhichcauseslowingdown.The
ordinaryprogramshadbetternotcareaboutsuchthings,butboththegarbagecollectorandthecoreoftheevaluatoristheruby’sbiggestbottleneck,soit’squiteworthtooptimizeevenforjustonemethodcall.
ruby_frame
ruby_frameisastacktorecordmethodcalls.ThestackframestructisstructFRAME.ThisterminologyisabitconfusingbutpleasebeawarethatI’lldistinctivelywriteitjustaframewhenitmeansa“stackframe”asageneralnounandFRAMEwhenitmeansstructFRAME.
▼ruby_frame
16externstructFRAME{17VALUEself;/*self*/18intargc;/*theargumentcount*/19VALUE*argv;/*thearrayofargumentvalues*/20IDlast_func;/*thenameofthisFRAME(whencalled)*/21IDorig_func;/*thenameofthisFRAME(whendefined)*/22VALUElast_class;/*theclassoflast_func'sreceiver*/23VALUEcbase;/*thebasepointforsearchingconstantsandclassvariables*/24structFRAME*prev;25structFRAME*tmp;/*toprotectfromGC.thiswillbedescribedlater*/26structRNode*node;/*thefilenameandthelinenumberofthecurrentlyexecutedline.*/27intiter;/*isthiscalledwithablock?*/28intflags;/*thebelowtwo*/29}*ruby_frame;
33#defineFRAME_ALLOCA0/*FRAMEisallocatedonthemachinestack*/34#defineFRAME_MALLOC1/*FRAMEisallocatedbymalloc*/
(env.h)
Firstafall,sincethere’stheprevmember,youcaninferthatthestackismadeofalinkedlist.(Fig.2)
Fig.2:ruby_frame
Thefactthatruby_xxxxpointstothetopstackframeiscommontoallstacksandwon’tbementionedeverytime.
Thefirstmemberofthestructisself.Thereisalsoselfintheargumentsofrb_eval(),butwhythisstructremembersanotherself?ThisisfortheC-levelfunctions.Moreprecisely,it’sforrb_call_super()thatiscorrespondingtosuper.Inordertoexecutesuper,itrequiresthereceiverofthecurrentmethod,butthecallersideofrb_call_super()couldnothavesuchinformation.However,thechainofrb_eval()isinterruptedbeforethetimewhentheexecutionoftheuser-definedCcodestarts.Therefore,theconclusionisthatthereneedawaytoobtaintheinformationofselfoutofnothing.And,FRAMEistherightplacetostoreit.
Thinkingalittlefurther,It’smysteriousthatthereareargcandargv.Becauseparametervariablesarelocalvariablesafterall,itisunnecessarytopreservethegivenargumentsafterassigningthemintothelocalvariablewiththesamenamesatthebeginningofthemethod,isn’tit?Then,whatistheuseofthem?Theansweristhatthisisactuallyforsuperagain.InRuby,whencallingsuperwithoutanyarguments,thevaluesoftheparametervariablesofthemethodwillbepassedtothemethodofthesuperclass.Thus,(thelocalvariablespacefor)theparametervariablesmustbereserved.
Additionally,thedifferencebetweenlast_funcandorig_funcwillbecomeoutinthecaseslikewhenthemethodisaliased.Forinstance,
classCdeforig()endaliasaliorigendC.new.ali
inthiscase,last_func=aliandorig_func=orig.Notsurprisingly,thesemembersalsohavetodowithsuper.
ruby_scope
ruby_scopeisthestacktorepresentthelocalvariablescope.Themethodandclassdefinitionstatements,themoduledefinitionstatementsandthesingletonclassdefinitionstatements,allofthemaredifferentscopes.ThestackframestructisstructSCOPE.
I’llcallthisframeSCOPE.
▼ruby_scope
36externstructSCOPE{37structRBasicsuper;38ID*local_tbl;/*anarrayofthelocalvariablenames*/39VALUE*local_vars;/*thespacetostorelocalvariables*/40intflags;/*thebelowfour*/41}*ruby_scope;
43#defineSCOPE_ALLOCA0/*local_varsisallocatedbyalloca*/44#defineSCOPE_MALLOC1/*local_varsisallocatedbymalloc*/45#defineSCOPE_NOSTACK2/*POP_SCOPEisdone*/46#defineSCOPE_DONT_RECYCLE4/*ProciscreatedwiththisSCOPE*/
(env.h)
SincethefirstelementisstructRBasic,thisisaRubyobject.ThisisinordertohandleProcobjects.Forexample,let’strytothinkaboutthecaselikethis:
defmake_counterlvar=0returnProc.new{lvar+=1}end
cnt=make_counter()pcnt.call#1pcnt.call#2pcnt.call#3cnt=nil#cutthereference.ThecreatedProcfinallybecomesunnecessaryhere.
TheProcobjectcreatedbythismethodwillpersistlongerthanthemethodthatcreatesit.And,becausetheProccanrefertothelocalvariablelvar,thelocalvariablesmustbepreserveduntiltheProc
willdisappear.Thus,ifitwerenothandledbythegarbagecollector,noonecandeterminethetimetofree.
TherearetworeasonswhystructSCOPEisseparatedfromstructFRAME.Firstly,thethingslikeclassdefinitionstatementsarenotmethodcallsbutcreatedistinctlocalvariablescopes.Secondly,whenacalledmethodisdefinedinCtheRuby’slocalvariablespaceisunnecessary.
ruby_block
structBLOCKistherealbodyofaRuby’siteratorblockoraProcobject,itisalsokindofasnapshotoftheevaluatoratsomepoint.ThisframewillalsobebrieflywrittenasBLOCKasinthesamemannerasFRAMEandSCOPE.
▼ruby_block
580staticstructBLOCK*ruby_block;
559structBLOCK{560NODE*var;/*theblockparameters(mlhs)*/561NODE*body;/*thecodeoftheblockbody*/562VALUEself;/*theselfwhenthisBLOCKiscreated*/563structFRAMEframe;/*thecopyofruby_framewhenthisBLOCKiscreated*/564structSCOPE*scope;/*theruby_scopewhenthisBLOCKiscreated*/565structBLOCKTAG*tag;/*theidentityofthisBLOCK*/566VALUEklass;/*theruby_classwhenthisBLOCKiscreated*/567intiter;/*theruby_iterwhenthisBLOCKiscreated*/568intvmode;/*thescope_vmodewhenthisBLOCKiscreated*/569intflags;/*BLOCK_D_SCOPE,BLOCK_DYNAMIC*/570structRVarmap*dyna_vars;/*theblocklocalvariablespace*/571VALUEorig_thread;/*thethreadthatcreatesthisBLOCK*/572VALUEwrapper;/*theruby_wrapperwhenthisBLOCKiscreated*/
573structBLOCK*prev;574};
553structBLOCKTAG{554structRBasicsuper;555longdst;/*destination,thatis,theplacetoreturn*/556longflags;/*BLOCK_DYNAMIC,BLOCK_ORPHAN*/557};
576#defineBLOCK_D_SCOPE1/*havingdistinctblocklocalscope*/577#defineBLOCK_DYNAMIC2/*BLOCKwastakenfromaRubyprogram*/578#defineBLOCK_ORPHAN4/*theFRAMEthatcreatesthisBLOCKhasfinished*/
(eval.c)
Notethatframeisnotapointer.ThisisbecausetheentirecontentofstructFRAMEwillbeallcopiedandpreserved.TheentirestructFRAMEis(forbetterperformance)allocatedonthemachinestack,butBLOCKcouldpersistlongerthantheFRAMEthatcreatesit,thepreservationisapreparationforthatcase.
Additionally,structBLOCKTAGisseparatedinordertodetectthesameblockwhenmultipleProcobjectsarecreatedfromtheblock.TheProcobjectswhichwerecreatedfromtheonesameblockhavethesameBLOCKTAG.
ruby_iter
Thestackruby_iterindicateswhethercurrentlycallingmethodisaniterator(whetheritiscalledwithablock).Theframeisstructiter.ButforconsistencyI’llcallitITER.
▼ruby_iter
767staticstructiter*ruby_iter;
763structiter{764intiter;/*thebelowthree*/765structiter*prev;766};
769#defineITER_NOT0/*thecurrentlyevaluatedmethodisnotaniterator*/770#defineITER_PRE1/*themethodwhichisgoingtobeevaluatednextisaniterator*/771#defineITER_CUR2/*thecurrentlyevaluatedmethodisaniterator*/(eval.c)
Althoughforeachmethodwecandeterminewhetheritisaniteratorornot,there’sanotherstructthatisdistinctfromstructFRAME.Why?
It’sobviousyouneedtoinformittothemethodwhen“itisaniterator”,butyoualsoneedtoinformthefactwhen“itisnotaniterator”.However,pushingawholeBLOCKjustforthisisveryheavy.Itwillalsocausethatinthecallersidetheproceduressuchasvariablereferenceswouldneedlesslyincrease.Thus,it’sbettertopushthesmallerandlighterITERinsteadofBLOCK.ThiswillbediscussedindetailinChapter16:Blocks.
ruby_dyna_vars
Theblocklocalvariablespace.TheframestructisstructRVarmapthathasalreadyseeninPart2.Formnowon,I’llcallitjustVARS.
▼structRVarmap
52structRVarmap{53structRBasicsuper;54IDid;/*thenameofthevariable*/55VALUEval;/*thevalueofthevariable*/56structRVarmap*next;57};
(env.h)
NotethataframeisnotasinglestructRVarmapbutalistofthestructs(Fig.3).Andeachframeiscorrespondingtoalocalvariablescope.Sinceitcorrespondsto“localvariablescope”andnot“blocklocalvariablescope”,forinstance,evenifblocksarenested,onlyasinglelistisusedtoexpress.Thebreakbetweenblocksaresimilartotheoneoftheparser,itisexpressedbyaRVarmap(header)whoseidis0.Detailsaredeferredagain.ItwillbeexplainedinChapter16:Blocks.
Fig.3:ruby_dyna_vars
ruby_class
ruby_classrepresentsthecurrentclasstowhichamethodis
defined.Sinceselfwillbethatclasswhenit’sanormalclassdefinitionstatement,ruby_class==self.But,whenitisthetoplevelorinthemiddleofparticularmethodslikeevalandinstance_eval,self!=ruby_classispossible.
Theframeofruby_classisasimpleVALUEandthere’snoparticularframestruct.Then,howcoulditbelikeastack?Moreover,thereweremanystructswithouttheprevpointer,howcouldtheseformastack?Theanswerisdeferredtothenextsection.
Fromnowon,I’llcallthisframeCLASS.
ruby_cref
ruby_crefrepresentstheinformationofthenestingofaclass.I’llcallthisframeCREFwiththesamewayofnamingasbefore.Itsstructis…
▼ruby_cref
847staticNODE*ruby_cref=0;
(eval.c)
…surprisinglyNODE.Thisisusedjustasa“definedstructwhichcanbepointedbyaVALUE”.ThenodetypeisNODE_CREFandtheassignmentsofitsmembersareshownbelow:
UnionMember MacroToAccess Usage
u1.value nd_clss theouterclass(VALUE)u2 – –u3.node nd_next preservethepreviousCREF
Eventhoughthemembernameisnd_next,thevalueitactuallyhasisthe“previous(prev)”CREF.Takingthefollowingprogramasanexample,I’llexplaintheactualappearance.
classAclassBclassCnil#(A)endendend
Fig.4showshowruby_crefiswhenevaluatingthecode(A).
Fig.4:ruby_cref
However,illustratingthisimageeverytimeistediousanditsintentionbecomesunclear.Therefore,thesamestateasFig.4willbeexpressedinthefollowingnotation:
A←B←C
PUSH/POPMacrosForeachstackframestruct,themacrostopushandpopareavailable.Forinstance,PUSH_FRAMEandPOP_FRAMEforFRAME.Becausethesewillappearinamoment,I’llthenexplaintheusageandcontent.
TheotherstatesWhiletheyarenotsoimportantasthemainstacks,theevaluatorofrubyhastheseveralotherstates.Thisisabrieflistofthem.However,someofthemarenotstacks.Actually,mostofthemarenot.
VariableName Type Meaning
scope_vmode int thedefaultvisibilitywhenamethodisdefined
ruby_in_eval int whetherornotparsingaftertheevaluationisstarted
ruby_current_node NODE* thefilenameandthelinenumberofwhatcurrentlybeingevaluated
ruby_safe_level int $SAFEruby_errinfo VALUE theexceptioncurrentlybeinghandled
ruby_wrapper VALUE thewrappermoduletoisolatetheenvironment
ModuleDefinition
Theclassstatementandthemodulestatementandthesingletonclassdefinitionstatement,theyareallimplementedinsimilarways.
Becauseseeingsimilarthingscontinuouslythreetimesisnotinteresting,thistimelet’sexaminethemodulestatementwhichhastheleastelements(thus,issimple).
Firstofall,whatisthemodulestatement?Conversely,whatshouldhappenisthemodulestatement?Let’strytolistupseveralfeatures:
anewmoduleobjectshouldbecreatedthecreatedmoduleshouldbeselfitshouldhaveanindependentlocalvariablescopeifyouwriteaconstantassignment,aconstantshouldbedefinedonthemoduleifyouwriteaclassvariableassignment,aclassvariableshouldbedefinedonthemodule.ifyouwriteadefstatement,amethodshouldbedefinedonthemodule
Whatisthewaytoarchivethesethings?…isthepointofthissection.Now,let’sstarttolookatthecodes.
Investigation
▼TheSourceProgram
moduleMa=1end
▼ItsSyntaxTree
NODE_MODULEnd_cname=9621(M)nd_body:NODE_SCOPEnd_rval=(null)nd_tbl=3[_~a]nd_next:NODE_LASGNnd_cnt=2nd_value:NODE_LITnd_lit=1:Fixnum
nd_cnameseemsthemodulename.cnameisprobablyeitherConstNAMEorClassNAME.Idumpedseveralthingsandfoundthatthere’salwaysNODE_SCOPEinnd_body.Sinceitsmembernd_tblholdsalocalvariabletableanditsnameissimilartostructSCOPE,itappearscertainthatthisNODE_SCOPEplaysanimportantroletocreatealocalvariablescope.
NODE_MODULE
Let’sexaminethehandlerofNODE_MODULEofrb_eval().Thepartsthatarenotclosetothemainline,suchasruby_raise()anderrorhandlingwerecutdrastically.Sofar,therehavebeenalotof
cuttingworksfor200pages,ithasalreadybecameunnecessarytoshowtheoriginalcode.
▼rb_eval()−NODE_MODULE(simplified)
caseNODE_MODULE:{VALUEmodule;
if(rb_const_defined_at(ruby_class,node->nd_cname)){/*justobtainthealreadycreatedmodule*/module=rb_const_get(ruby_class,node->nd_cname);}else{/*createanewmoduleandsetitintotheconstant*/module=rb_define_module_id(node->nd_cname);rb_const_set(ruby_cbase,node->nd_cname,module);rb_set_class_path(module,ruby_class,rb_id2name(node->nd_cname));}
result=module_setup(module,node->nd_body);}break;
First,we’dliketomakesurethemoduleisnestedanddefinedabove(themoduleholdedby)ruby_class.Wecanunderstanditfromthefactthatitcallsruby_const_xxxx()onruby_class.Justonceruby_cbaseappears,butitisusuallyidenticaltoruby_class,sowecanignoreit.Eveniftheyaredifferent,itrarelycausesaproblem.
Thefirsthalf,itisbranchingbyifbecauseitneedstocheckifthemodulehasalreadybeendefined.Thisisbecause,inRuby,wecando“additional”definitionsonthesameonemoduleanynumberoftimes.
moduleMdefa#M#aisdeifnedendendmoduleM#addadefinition(notre-definingoroverwriting)defb#M#bisdefinedendend
Inthisprogram,thetwomethods,aandb,willbedefinedonthemoduleM.
Inthiscase,ontheseconddefinitionofMthemoduleMwasalreadysettotheconstant,justobtainingandusingitwouldbesufficient.IftheconstantMdoesnotexistyet,itmeansthefirstdefinitionandthemoduleiscreated(byrb_define_module_id())
Lastly,module_setup()isthefunctionexecutingthebodyofamodulestatement.Notonlythemodulestatementsbuttheclassstatementsandthesingletonclassstatementsareexecutedbymodule_setup().ThisisthereasonwhyIsaid“allofthesethreetypeofstatementsaresimilarthings”.Fornow,I’dlikeyoutonotethatnode->nd_body(NODE_SCOPE)ispassedasanargument.
module_setup
Forthemoduleandclassandsingletonclassstatements,module_setup()executestheirbodies.Finally,theRubystackmanipulationswillappearinlargeamounts.
▼module_setup()
3424staticVALUE3425module_setup(module,n)3426VALUEmodule;3427NODE*n;3428{3429NODE*volatilenode=n;3430intstate;3431structFRAMEframe;3432VALUEresult;/*OK*/3433TMP_PROTECT;34343435frame=*ruby_frame;3436frame.tmp=ruby_frame;3437ruby_frame=&frame;34383439PUSH_CLASS();3440ruby_class=module;3441PUSH_SCOPE();3442PUSH_VARS();3443/*(A)ruby_scope->local_varsinitialization*/3444if(node->nd_tbl){3445VALUE*vars=TMP_ALLOC(node->nd_tbl[0]+1);3446*vars++=(VALUE)node;3447ruby_scope->local_vars=vars;3448rb_mem_clear(ruby_scope->local_vars,node->nd_tbl[0]);3449ruby_scope->local_tbl=node->nd_tbl;3450}3451else{3452ruby_scope->local_vars=0;3453ruby_scope->local_tbl=0;3454}34553456PUSH_CREF(module);3457ruby_frame->cbase=(VALUE)ruby_cref;3458PUSH_TAG(PROT_NONE);3459if((state=EXEC_TAG())==0){3460if(trace_func){3461call_trace_func("class",ruby_current_node,ruby_class,3462ruby_frame->last_func,3463ruby_frame->last_class);
3464}3465result=rb_eval(ruby_class,node->nd_next);3466}3467POP_TAG();3468POP_CREF();3469POP_VARS();3470POP_SCOPE();3471POP_CLASS();34723473ruby_frame=frame.tmp;3474if(trace_func){3475call_trace_func("end",ruby_last_node,0,3476ruby_frame->last_func,ruby_frame->last_class);3477}3478if(state)JUMP_TAG(state);34793480returnresult;3481}
(eval.c)
Thisistoobigtoreadallinonegulp.Let’scutthepartsthatseemsunnecessary.
First,thepartsaroundtrace_funccanbedeletedunconditionally.
Wecanseetheidiomsrelatedtotags.Let’ssimplifythembyexpressingwiththeRuby’sensure.
Immediatelyafterthestartofthefunction,theargumentnispurposefullyassignedtothelocalvariablenode,butvolatileisattachedtonodeanditwouldneverbeassignedafterthat,thusthisistopreventfrombeinggarbagecollected.Ifweassumethattheargumentwasnodefromthebeginning,itwouldnotchangethemeaning.
Inthefirsthalfofthefunction,there’sthepartmanipulatingruby_framecomplicatedly.Itisobviouslypairedupwiththepartruby_frame=frame.tmpinthelasthalf.We’llfocusonthispartlater,butforthetimebeingthiscanbeconsideredaspushpopofruby_frame.
Plus,itseemsthatthecode(A)canbe,ascommented,summarizedastheinitializationofruby_scope->local_vars.Thiswillbediscussedlater.
Consequently,itcouldbesummarizedasfollows:
▼module_setup(simplified)
staticVALUEmodule_setup(module,node)VALUEmodule;NODE*node;{structFRAMEframe;VALUEresult;
pushFRAMEPUSH_CLASS();ruby_class=module;PUSH_SCOPE();PUSH_VARS();ruby_scope->local_varsinitializaionPUSH_CREF(module);ruby_frame->cbase=(VALUE)ruby_cref;beginresult=rb_eval(ruby_class,node->nd_next);ensurePOP_TAG();POP_CREF();POP_VARS();
POP_SCOPE();POP_CLASS();popFRAMEendreturnresult;}
Itdoesrb_eval()withnode->nd_next,soit’scertainthatthisisthecodeofthemodulebody.Theproblemsareabouttheothers.Thereare5pointstosee.
ThingsoccuronPUSH_SCOPE()PUSH_VARS()HowthelocalvariablespaceisallocatedTheeffectofPUSH_CLASSTherelationshipbetweenruby_crefandruby_frame->cbaseWhatisdonebymanipulatingruby_frame
Let’sinvestigatetheminorder.
CreatingalocalvariablescopePUSH_SCOPEpushesalocalvariablespaceandPUSH_VARS()pushesablocklocalvariablespace,thusanewlocalvariablescopeiscreatedbythesetwo.Let’sexaminethecontentsofthesemacrosandwhatisdone.
▼PUSH_SCOPE()POP_SCOPE()
852#definePUSH_SCOPE()do{\853volatileint_vmode=scope_vmode;\854structSCOPE*volatile_old;\
855NEWOBJ(_scope,structSCOPE);\856OBJSETUP(_scope,0,T_SCOPE);\857_scope->local_tbl=0;\858_scope->local_vars=0;\859_scope->flags=0;\860_old=ruby_scope;\861ruby_scope=_scope;\862scope_vmode=SCOPE_PUBLIC
869#definePOP_SCOPE()\870if(ruby_scope->flags&SCOPE_DONT_RECYCLE){\871if(_old)scope_dup(_old);\872}\873if(!(ruby_scope->flags&SCOPE_MALLOC)){\874ruby_scope->local_vars=0;\875ruby_scope->local_tbl=0;\876if(!(ruby_scope->flags&SCOPE_DONT_RECYCLE)&&\877ruby_scope!=top_scope){\878rb_gc_force_recycle((VALUE)ruby_scope);\879}\880}\881ruby_scope->flags|=SCOPE_NOSTACK;\882ruby_scope=_old;\883scope_vmode=_vmode;\884}while(0)
(eval.c)
Asthesameastags,SCOPEsalsocreateastackbybeingsynchronizedwiththemachinestack.Whatdifferentiateslightlyisthatthespacesofthestackframesareallocatedintheheap,themachinestackisusedinordertocreatethestackstructure(Fig.5.).
Fig.5.ThemachinestackandtheSCOPEStack
Additionally,theflagslikeSCOPE_somethingrepeatedlyappearinginthemacrosarenotabletobeexplaineduntilIfinishtotalkallaboutinwhatformeachstackframeisrememberedandaboutblocks.Thus,thesewillbediscussedinChapter16:Blocksallatonce.
AllocatingthelocalvariablespaceAsImentionedmanytimes,thelocalvariablescopeisrepresentedbystructSCOPE.ButstructSCOPEisliterallya“scope”anditdoesnothavetherealbodytostorelocalvariables.Toputitmoreprecisely,ithasthepointertoaspacebutthere’sstillnoarrayattheplacewheretheonepointsto.Thefollowingpartofmodule_setuppreparesthearray.
▼Thepreparationofthelocalvariableslots
3444if(node->nd_tbl){3445VALUE*vars=TMP_ALLOC(node->nd_tbl[0]+1);3446*vars++=(VALUE)node;3447ruby_scope->local_vars=vars;3448rb_mem_clear(ruby_scope->local_vars,node->nd_tbl[0]);3449ruby_scope->local_tbl=node->nd_tbl;3450}3451else{3452ruby_scope->local_vars=0;3453ruby_scope->local_tbl=0;3454}
(eval.c)
TheTMP_ALLOC()atthebeginningwillbedescribedinthenextsection.IfIputitshortly,itis“allocathatisassuredtoallocateonthestack(therefore,wedonotneedtoworryaboutGC)”.
node->nd_tblholdsinfactthelocalvariablenametablethathasappearedinChapter12:Syntaxtreeconstruction.Itmeansthatnd_tbl[0]containsthetablesizeandtherestisanarrayofID.Thistableisdirectlypreservedtolocal_tblofSCOPEandlocal_varsisallocatedtostorethelocalvariablevalues.Becausetheyareconfusing,it’sagoodthingwritingsomecommentssuchas“Thisisthevariablename”,“thisisthevalue”.Theonewithtblisforthenames.
Fig.6.ruby_scope->local_vars
Whereisthisnodeused?Iexaminedthealllocal_varsmembersbutcouldnotfindtheaccesstoindex-1ineval.c.Expandingtherangeoffilestoinvestigate,Ifoundtheaccessingc.c.
▼rb_gc_mark_children()—T_SCOPE
815caseT_SCOPE:816if(obj->as.scope.local_vars&&(obj->as.scope.flags&SCOPE_MALLOC)){817intn=obj->as.scope.local_tbl[0]+1;818VALUE*vars=&obj->as.scope.local_vars[-1];819820while(n--){821rb_gc_mark(*vars);822vars++;823}824}825break;
(gc.c)
Apparently,thisisamechanismtoprotectnodefromGC.Butwhyisitnecessarytotomarkithere?nodeispurposefullystoreintothevolatilelocalvariable,soitwouldnotbegarbage-collectedduringtheexecutionofmodule_setup().
Honestlyspeaking,Iwasthinkingitmightmerelybeamistakeforawhilebutitturnedoutit’sactuallyveryimportant.Theissueisthisatthenextlineofthenextline:
▼ruby_scope->local_tbl
3449ruby_scope->local_tbl=node->nd_tbl;
(eval.c)
Thelocalvariablenametablepreparedbytheparserisdirectlyused.Whenisthistablefreed?It’sthetimewhenthenodebecomenottobereferredfromanywhere.Then,whenshouldnodebefreed?It’sthetimeaftertheSCOPEassignedonthislinewilldisappearcompletely.Then,whenisthat?
SCOPEsometimespersistslongerthanthestatementthatcausesthecreationofit.AsitwillbediscussedatChapter16:Blocks,ifaProcobjectiscreated,itrefersSCOPE.Thus,Ifmodule_setup()hasfinished,theSCOPEcreatedthereisnotnecessarilybewhatisnolongerused.That’swhyit’snotsufficientthatnodeisonlyreferredfrom(thestackframeof)module_setup().Itmustbereferred“directly”fromSCOPE.
Ontheotherhand,thevolatilenodeofthelocalvariablecannotberemoved.Withoutit,nodeisfloatingonairuntilitwillbeassignedtolocal_vars.
Howeverthen,local_varsofSCOPEisnotsafe,isn’tit?TMP_ALLOC()is,asImentioned,theallocationonthestack,itbecomesinvalidatthetimemodule_setup()ends.Thisisinfact,atthemomentwhenProciscreated,theallocationmethodisabruptlyswitchedtomalloc().DetailswillbedescribedinChapter16:Blocks.
Lastly,rb_mem_clear()seemszero-fillingbutactuallyitisQnil-fillingtoanarrayofVALUE(array.c).Bythis,alldefinedlocalvariablesareinitializedasnil.
TMP_ALLOC
Next,let’sreadTMP_ALLOCthatallocatesthelocalvariablespace.ThismacroisactuallypairedwithTMP_PROTECTexistingsilentlyatthebeginningofmodule_setup().Itstypicalusageisthis:
VALUE*ptr;TMP_PROTECT;
ptr=TMP_ALLOC(size);
ThereasonwhyTMP_PROTECTisintheplaceforthelocalvariabledefinitionsisthat…Let’sseeitsdefinition.
▼TMP_ALLOC()
1769#ifdefC_ALLOCA1770#defineTMP_PROTECTNODE*volatiletmp__protect_tmp=01771#defineTMP_ALLOC(n)\1772(tmp__protect_tmp=rb_node_newnode(NODE_ALLOCA,\1773ALLOC_N(VALUE,n),tmp__protect_tmp,n),\1774(void*)tmp__protect_tmp->nd_head)1775#else1776#defineTMP_PROTECTtypedefintfoobazzz1777#defineTMP_ALLOC(n)ALLOCA_N(VALUE,n)1778#endif
(eval.c)
…itisbecauseitdefinesalocalvariable.
AsdescribedinChapter5:Garbagecollection,intheenvironmentof#ifdefC_ALLOCA(thatis,thenativealloca()doesnotexist)malloca()isusedtoemulatealloca().However,theargumentsofamethodareobviouslyVALUEsandtheGCcouldnotfindaVALUEifitisstoredintheheap.Therefore,itisenforcedthatGCcanfinditthroughNODE.
Fig.7.anchorthespacetothestackthroughNODE
Onthecontrary,intheenvironmentwiththetruealloca(),wecannaturallyusealloca()andthere’snoneedtouseTMP_PROTECT.Thus,aharmlessstatementisarbitrarilywritten.
Bytheway,whydotheywanttousealloca()verymuchbyallmeans.It’smerelybecause"alloca()isfasterthanmalloc()",theysaid.Onecanthinkthatit’snotsoworthtocareaboutsuchtinydifference,butbecausethecoreoftheevaluatoristhebiggest
bottleneckofruby,…thesameasabove.
Changingtheplacetodefinemethodson.
Thevalueofthestackruby_classistheplacetodefineamethodonatthetime.Conversely,ifonepushavaluetoruby_class,itchangestheclasstodefineamethodon.Thisisexactlywhatisnecessaryforaclassstatement.Therefore,It’salsonecessarytodoPUSH_CLASS()inmodule_setup().Hereisthecodeforit:
PUSH_CLASS();ruby_class=module;::POP_CLASS();
Whyistheretheassignmenttoruby_classafterdoingPUSH_CLASS().Wecanunderstanditunexpectedlyeasilybylookingatthedefinition.
▼PUSH_CLASS()POP_CLASS()
841#definePUSH_CLASS()do{\842VALUE_class=ruby_class
844#definePOP_CLASS()ruby_class=_class;\845}while(0)
(eval.c)
Becauseruby_classisnotmodifiedeventhoughPUSH_CLASSisdone,
itisnotactuallypusheduntilsettingbyhand.Thus,thesetwoarecloserto“saveandrestore”ratherthan“pushandpop”.
YoumightthinkthatitcanbeacleanermacroifpassingaclassastheargumentofPUSH_CLASS()…It’sabsolutelytrue,butbecausetherearesomeplaceswecannotobtaintheclassbeforepushing,itisinthisway.
NestingClassesruby_crefrepresentstheclassnestinginformationatruntime.Therefore,it’snaturallypredictedthatruby_crefwillbepushedonthemodulestatementsorontheclassstatements.Inmodule_setup(),itispushedasfollows:
PUSH_CREF(module);ruby_frame->cbase=(VALUE)ruby_cref;::POP_CREF();
Here,moduleisthemodulebeingdefined.Let’salsoseethedefinitionsofPUSH_CREF()andPOP_CREF().
▼PUSH_CREF()POP_CREF()
849#definePUSH_CREF(c)\ruby_cref=rb_node_newnode(NODE_CREF,(c),0,ruby_cref)850#definePOP_CREF()ruby_cref=ruby_cref->nd_next
(eval.c)
UnlikePUSH_SCOPEorsomething,therearenotanycomplicatedtechniquesandit’sveryeasytodealwith.It’salsonotgoodifthere’scompletelynotanysuchthing.
Theproblemremainsunsolvediswhatisthemeaningofruby_frame->cbase.ItistheinformationtoreferaclassvariableoraconstantfromthecurrentFRAME.Detailswillbediscussedinthelastsectionofthischapter.
ReplacingframesLastly,let’sfocusonthemanipulationofruby_frame.Thefirstthingisitsdefinition:
structFRAMEframe;
Itisnotapointer.ThismeansthattheentireFRAMEisallocatedonthestack.BoththemanagementstructureoftheRubystackandthelocalvariablespaceareonthestack,butinthecaseofFRAMEtheentirestructisstoredonthestack.Theextremeconsumptionofthemachinestackbyrubyisthefruitofthese“smalltechniques”pilingup.
Thennext,let’slookatwheredoingseveralthingswithframe.
frame=*ruby_frame;/*copytheentirestruct*/frame.tmp=ruby_frame;/*protecttheoriginalFRAMEfromGC*/ruby_frame=&frame;/*replaceruby_frame*/::
ruby_frame=frame.tmp;/*restore*/
Thatis,ruby_frameseemstemporarilyreplaced(notpushing).Whyisitdoingsuchthing?
IdescribedthatFRAMEis“pushedonmethodcalls”,buttobemoreprecise,itisthestackframetorepresent“themainenvironmenttoexecuteaRubyprogram”.Youcaninferitfrom,forinstance,ruby_frame->cbasewhichappearedpreviously.last_funcwhichis“thelastcalledmethodname”alsosuggestsit.
Then,whyisFRAMEnotstraightforwardlypushed?ItisbecausethisistheplacewhereitisnotallowedtopushFRAME.FRAMEiswantedtobepushed,butifFRAMEispushed,itwillappearinthebacktracesoftheprogramwhenanexceptionoccurs.Thebacktracesarethingsdisplayedlikefollowings:
%rubyt.rbt.rb:11:in`c':someerroroccured(ArgumentError)fromt.rb:7:in`b'fromt.rb:3:in`a'fromt.rb:14
Butthemodulestatementsandtheclassstatementsarenotmethodcalls,soitisnotdesirabletoappearinthis.That’swhyitis“replaced”insteadof“pushed”.
Themethoddefinition
Asthenexttopicofthemoduledefinitions,let’slookatthemethoddefinitions.
Investigation▼TheSourceProgram
defm(a,b,c)nilend
▼ItsSyntaxTree
NODE_DEFNnd_mid=9617(m)nd_noex=2(NOEX_PRIVATE)nd_defn:NODE_SCOPEnd_rval=(null)nd_tbl=5[_~abc]nd_next:NODE_ARGSnd_cnt=3nd_rest=-1nd_opt=(null)NODE_NIL
Idumpedseveralthingsandfoundthatthere’salwaysNODE_SCOPEinnd_defn.NODE_SCOPEis,aswe’veseenatthemodulestatements,thenodetostoretheinformationtopushalocalvariablescope.
NODE_DEFN
Subsequently,wewillexaminethecorrespondingcodeofrb_eval().Thispartcontainsalotoferrorhandlingsandtedious,theyareallomittedagain.Thewayofomittingisasusual,deletingtheeverypartstodirectlyorindirectlycallrb_raise()rb_warn()rb_warning().
▼rb_eval()−NODE_DEFN(simplified)
NODE*defn;intnoex;
if(SCOPE_TEST(SCOPE_PRIVATE)||node->nd_mid==init){noex=NOEX_PRIVATE;(A)}elseif(SCOPE_TEST(SCOPE_PROTECTED)){noex=NOEX_PROTECTED;(B)}elseif(ruby_class==rb_cObject){noex=node->nd_noex;(C)}else{noex=NOEX_PUBLIC;(D)}
defn=copy_node_scope(node->nd_defn,ruby_cref);rb_add_method(ruby_class,node->nd_mid,defn,noex);result=Qnil;
Inthefirsthalf,therearethewordslikeprivateorprotected,soitisprobablyrelatedtovisibility.noex,whichisusedasthenamesofflags,seemsNOdeEXposure.Let’sexaminetheifstatementsinorder.
(A)SCOPE_TEST()isamacrotocheckifthere’sanargumentflaginscope_vmode.Therefore,thefirsthalfofthisconditionalstatement
means“isitaprivatescope?”.Thelasthalfmeans“it’sprivateifthisisdefininginitialize”.Themethodinitializetoinitializeanobjectwillunquestionablybecomeprivate.
(B)Itisprotectedifthescopeisprotected(notsurprisingly).Myfeelingisthatthere’refewcasesprotectedisrequiredinRuby.
(C)Thisisabug.Ifoundthisjustbeforethesubmissionofthisbook,soIcouldn’tfixthisbeforehand.Inthelatestcodethispartisprobablyalreadyremoved.Theoriginalintentionistoenforcethemethodsdefinedattopleveltobeprivate.
(D)Ifitisnotanyoftheaboveconditions,itispublic.
Actually,there’snotathingtoworthtocareaboutuntilhere.Theimportantpartisthenexttwolines.
defn=copy_node_scope(node->nd_defn,ruby_cref);rb_add_method(ruby_class,node->nd_mid,defn,noex);
copy_node_scope()isafunctiontocopy(only)NODE_SCOPEattachedtothetopofthemethodbody.Itisimportantthatruby_crefispassed…butdetailswillbedescribedsoon.
Aftercopying,thedefinitionisfinishedbyaddingitbyrb_add_method().Theplacetodefineonisofcourseruby_class.
copy_node_scope()
copy_node_scope()iscalledonlyfromthetwoplaces:themethoddefinition(NODE_DEFN)andthesingletonmethoddefinition(NODE_DEFS)inrb_eval().Therefore,lookingatthesetwoissufficienttodetecthowitisused.Plus,theusagesatthesetwoplacesarealmostthesame.
▼copy_node_scope()
1752staticNODE*1753copy_node_scope(node,rval)1754NODE*node;1755VALUErval;1756{1757NODE*copy=rb_node_newnode(NODE_SCOPE,0,rval,node->nd_next);17581759if(node->nd_tbl){1760copy->nd_tbl=ALLOC_N(ID,node->nd_tbl[0]+1);1761MEMCPY(copy->nd_tbl,node->nd_tbl,ID,node->nd_tbl[0]+1);1762}1763else{1764copy->nd_tbl=0;1765}1766returncopy;1767}
(eval.c)
Imentionedthattheargumentrvalistheinformationoftheclassnesting(ruby_cref)ofwhenthemethodisdefined.Apparently,itisrvalbecauseitwillbesettond_rval.
Inthemainifstatementcopiesnd_tblofNODE_SCOPE.Itisalocalvariablenametableinotherwords.The+1atALLOC_Nistoadditionallyallocatethespacefornd_tbl[0].Aswe’veseeninPart
2,nd_tbl[0]holdsthelocalvariablescount,thatwas“theactuallengthofnd_tbl–1”.
Tosummarize,copy_node_scope()makesacopyoftheNODE_SCOPEwhichistheheaderofthemethodbody.However,nd_rvalisadditionallysetanditistheruby_cref(theclassnestinginformation)ofwhentheclassisdefined.Thisinformationwillbeusedlaterwhenreferringconstantsorclassvariables.
rb_add_method()
Thenextthingisrb_add_method()thatisthefunctiontoregisteramethodentry.
▼rb_add_method()
237void238rb_add_method(klass,mid,node,noex)239VALUEklass;240IDmid;241NODE*node;242intnoex;243{244NODE*body;245246if(NIL_P(klass))klass=rb_cObject;247if(ruby_safe_level>=4&&(klass==rb_cObject||!OBJ_TAINTED(klass))){248rb_raise(rb_eSecurityError,"Insecure:can'tdefinemethod");249}250if(OBJ_FROZEN(klass))rb_error_frozen("class/module");251rb_clear_cache_by_id(mid);252body=NEW_METHOD(node,noex);253st_insert(RCLASS(klass)->m_tbl,mid,body);254}
(eval.c)
NEW_METHOD()isamacrotocreateNODE.rb_clear_cache_by_id()isafunctiontomanipulatethemethodcache.Thiswillbeexplainedinthenextchapter“Method”.
Let’slookatthesyntaxtreewhichiseventuallystoredinm_tblofaclass.Ipreparednodedump-methodforthiskindofpurposes.(nodedump-method:comeswithnodedump.nodedumpistools/nodedump.tar.gzoftheattachedCD-ROM)
%ruby-e'classCdefm(a)puts"ok"endendrequire"nodedump-method"NodeDump.dumpC,:m#dumpthemethodmoftheclassC'NODE_METHODnd_noex=0(NOEX_PUBLIC)nd_cnt=0nd_body:NODE_SCOPEnd_rval=Object<-Cnd_tbl=3[_~a]nd_next:NODE_ARGSnd_cnt=1nd_rest=-1nd_opt=(null)U⽛S頏著
**unhandled**
ThereareNODE_METHODatthetopandNODE_SCOPEpreviouslycopiedbycopy_node_scope()atthenext.Theseprobablyrepresenttheheaderofamethod.Idumpedseveralthingsandthere’snotanyNODE_SCOPEwiththemethodsdefinedinC,thusitseemstoindicatethatthemethodisdefinedatRubylevel.
Additionally,atnd_tblofNODE_SCOPEtheparametervariablename(a)appears.Imentionedthattheparametervariablesareequivalenttothelocalvariables,andthisbrieflyimpliesit.
I’llomittheexplanationaboutNODE_ARGSherebecauseitwillbedescribedatthenextchapter“Method”.
Lastly,thend_cntoftheNODE_METHOD,it’snotsonecessarytocareaboutthistime.Itisusedwhenhavingtodowithalias.
AssignmentandReference
Cometothinkofit,mostofthestacksareusedtorealizeavarietyofvariables.Wehavelearnedtopushvariousstacks,thistimelet’sexaminethecodetoreferencevariables.
Localvariable
Theallnecessaryinformationtoassignorreferlocalvariableshasappeared,soyouareprobablyabletopredict.Therearethefollowingtwopoints:
localvariablescopeisanarraywhichispointedbyruby_scope->local_vars
thecorrespondencebetweeneachlocalvariablenameandeacharrayindexhasalreadyresolvedattheparserlevel.
Therefore,thecodeforthelocalvariablereferencenodeNODE_LVARisasfollows:
▼rb_eval()−NODE_LVAR
2975caseNODE_LVAR:2976if(ruby_scope->local_vars==0){2977rb_bug("unexpectedlocalvariable");2978}2979result=ruby_scope->local_vars[node->nd_cnt];2980break;
(eval.c)
Itgoeswithoutsayingbutnode->nd_cntisthevaluethatlocal_cnt()oftheparserreturns.
Constant
CompleteSpecificationInChapter6:Variablesandconstants,Italkedaboutinwhatform
constantsarestoredandAPI.Constantsarebelongtoclassesandinheritedasthesameasmethods.Asfortheiractualappearances,theyareregisteredtoiv_tblofstructRClasswithinstancevariablesandclassvariables.
Thesearchingpathofaconstantisfirstlytheouterclass,secondlythesuperclass,however,rb_const_get()onlysearchesthesuperclass.Why?Toanswerthisquestion,Ineedtorevealthelastspecificationofconstants.Takealookatthefollowingcode:
classAC=5defA.newputsCsuperendend
A.newisasingletonmethodofA,soitsclassisthesingletonclass(A).Ifitisinterpretedbyfollowingtherule,itcannotobtaintheconstantCwhichisbelongstoA.
Butbecauseitiswrittensoclose,tobecometowantrefertheconstantCishumannature.Therefore,suchreferenceispossibleinRuby.ItcanbesaidthatthisspecificationreflectsthecharacteristicofRuby“Theemphasisisontheappearanceofthesourcecode”.
IfIgeneralizethisrule,whenreferringaconstantfrominsideofamethod,bysettingtheplacewhichthemethoddefinitionis“written”asthestartpoint,itreferstheconstantoftheouterclass.
And,“theclassofwherethemethodiswritten”dependsonitscontext,thusitcouldnotbehandledwithouttheinformationfromboththeparserandtheevaluator.Thisiswhyrb_cost_get()didnothavethesearchingpathoftheouterclass.
cbase
Then,let’slookatthecodetoreferconstantsincludingtheouterclass.Theordinaryconstantreferencestowhich::isnotattached,becomeNODE_CONSTinthesyntaxtree.Thecorrespondingcodeinrb_eval()is…
▼rb_eval()−NODE_CONST
2994caseNODE_CONST:2995result=ev_const_get(RNODE(ruby_frame->cbase),node->nd_vid,self);2996break;
(eval.c)
First,nd_vidappearstobeVariableIDanditprobablymeansaconstantname.And,ruby_frame->cbaseis“theclasswherethemethoddefinitioniswritten”.Thevaluewillbesetwheninvokingthemethod,thusthecodetosethasnotappearedyet.Andtheplacewherethevaluetobesetcomesfromisthend_rvalthathasappearedincopy_node_scope()ofthemethoddefinition.I’dlikeyoutogobackalittleandcheckthatthememberholdstheruby_crefofwhenthemethodisdefined.
Thismeans,first,theruby_creflinkisbuiltwhendefiningaclassoramodule.AssumethatthejustdefinedclassisC(Fig.81),
Definingthemethodm(thisisprobablyC#m)here,thenthecurrentruby_crefismemorizedbythemethodentry(Fig.82).
Afterthat,whentheclassstatementfinishedtheruby_crefwouldstarttopointanothernode,butnode->nd_rvalnaturallycontinuestopointtothesamething.(Fig.83)
Then,wheninvokingthemethodC#m,getnode->nd_rvalandinsertintothejustpushedruby_frame->cbase(Fig.84)
…Thisisthemechanism.Complicated.
Fig8.CREFTrasfer
ev_const_get()
Now,let’sgobacktothecodeofNODE_CONST.Sinceonlyev_const_get()isleft,we’lllookatit.
▼ev_const_get()
1550staticVALUE1551ev_const_get(cref,id,self)1552NODE*cref;1553IDid;1554VALUEself;1555{1556NODE*cbase=cref;1557VALUEresult;15581559while(cbase&&cbase->nd_next){1560VALUEklass=cbase->nd_clss;15611562if(NIL_P(klass))returnrb_const_get(CLASS_OF(self),id);1563if(RCLASS(klass)->iv_tbl&&st_lookup(RCLASS(klass)->iv_tbl,id,&result)){1564returnresult;1565}1566cbase=cbase->nd_next;1567}1568returnrb_const_get(cref->nd_clss,id);1569}
(eval.c)
((Accordingtotheerrata,thedescriptionofev_const_get()waswrong.Iomitthispartfornow.))
ClassvariableWhatclassvariablesrefertoisalsoruby_cref.Needlesstosay,unliketheconstantswhichsearchovertheouterclassesoneafteranother,itusesonlythefirstelement.Let’slookatthecodeofNODE_CVARwhichisthenodetorefertoaclassvariable.
Whatisthecvar_cbase()?Ascbaseisattached,itisprobablyrelatedtoruby_frame->cbase,buthowdotheydiffer?Let’slookatit.
▼cvar_cbase()
1571staticVALUE1572cvar_cbase()1573{1574NODE*cref=RNODE(ruby_frame->cbase);15751576while(cref&&cref->nd_next&&FL_TEST(cref->nd_clss,FL_SINGLETON)){1577cref=cref->nd_next;1578if(!cref->nd_next){1579rb_warn("classvariableaccessfromtoplevelsingletonmethod");1580}1581}1582returncref->nd_clss;1583}
(eval.c)
Ittraversescbaseuptotheclassthatisnotthesingletonclass,itseems.Thisfeatureisaddedtocounterthefollowingkindofcode:
classCclassC@@cvar=1@@cvar=1class<<CdefC.mdefm@@cvar@@cvarendenddefC.m2defm2@@cvar+@@cvar@@cvar+@@cvarendendendendend
Boththeleftandrightcodeendsupdefiningthesamemethod,butifyouwriteinthewayoftherightsideitistedioustowritetheclassnamerepeatedlyasthenumberofmethodsincreases.Therefore,whendefiningmultiplesingletonmethods,manypeoplechoosetowriteintheleftsidewayofusingthesingletonclassdefinitionstatementtobundle.
However,thesetwodiffersinthevalueofruby_cref.Theoneusingthesingletonclassdefinitionisruby_cref=(C)andtheotheronedefiningsingletonmethodsseparatelyisruby_cref=C.Thismaycausetodifferintheplaceswhereclassvariablesreferto,sothisisnotconvenient.
Therefore,assumingit’srarecasetodefineclassvariablesonsingletonclasses,itskipsoversingletonclasses.Thisreflectsagainthattheemphasisismoreontheusabilityratherthantheconsistency.
And,whenthecaseisaconstantreference,sinceitsearchesalloftheouterclasses,Cisincludedinthesearchpathineitherway,sothere’snoproblem.Plus,asforanassignment,sinceitcouldn’tbewritteninsidemethodsinthefirstplace,itisalsonotrelated.
MultipleAssignmentIfsomeoneasked“whereisthemostcomplicatedspecificationofRuby?”,Iwouldinstantlyanswerthatitismultipleassignment.Itisevenimpossibletounderstandthebigpictureofmultiple
assignment,IhaveanaccountofwhyIthinkso.Inshort,thespecificationofthemultipleassignmentisdefinedwithoutevenasubtleintentiontoconstructsothatthewholespecificationiswell-organized.Thebasisofthespecificationisalways“thebehaviorwhichseemsconvenientinseveraltypicalusecases”.ThiscanbesaidabouttheentireRuby,butparticularlyaboutthemultipleassignment.
Then,howcouldweavoidbeinglostinthejungleofcodes.Thisissimilartoreadingthestatefulscanneranditisnotseeingthewholepicture.There’snowholepictureinthefirstplace,wecouldnotseeit.Cuttingthecodeintoblockslike,thiscodeiswrittenforthisspecification,thatcodeiswrittenforthatspecification,…understandingthecorrespondencesonebyoneinsuchmanneristheonlyway.
Butthisbookistounderstandtheoverallstructureofrubyandisnot“AdvancedRubyProgramming”.Thus,dealingwithverytinythingsisnotfruitful.Sohere,weonlythinkaboutthebasicstructureofmultipleassignmentandtheverysimple“multiple-to-multiple”case.
First,followingthestandard,let’sstartwiththesyntaxtree.
▼TheSourceProgram
a,b=7,8
▼ItsSyntaxTree
NODE_MASGNnd_head:NODE_ARRAY[0:NODE_LASGNnd_cnt=2nd_value:1:NODE_LASGNnd_cnt=3nd_value:]nd_value:NODE_REXPANDnd_head:NODE_ARRAY[0:NODE_LITnd_lit=7:Fixnum1:NODE_LITnd_lit=8:Fixnum]
Boththeleft-handandright-handsidesarethelistsofNODE_ARRAY,there’sadditionallyNODE_REXPANDintherightside.REXPANDmaybe“RightvalueEXPAND”.Wearecuriousaboutwhatthisnodeisdoing.Let’ssee.
▼rb_eval()−NODE_REXPAND
2575caseNODE_REXPAND:2576result=avalue_to_svalue(rb_eval(self,node->nd_head));2577break;
(eval.c)
Youcanignoreavalue_to_svalue().NODE_ARRAYisevaluatedbyrb_eval(),(becauseitisthenodeofthearrayliteral),itisturnedintoaRubyarrayandreturnedback.So,beforetheleft-handsideishandled,allintheright-handsideareevaluated.Thisenableseventhefollowingcode:
a,b=b,a#swapvariablesinoneline
Let’slookatNODE_MASGNintheleft-handside.
▼rb_eval()−NODE_MASGN
2923caseNODE_MASGN:2924result=massign(self,node,rb_eval(self,node->nd_value),0);2925break;
(eval.c)
Hereisonlytheevaluationoftheright-handside,therestsaredelegatedtomassign().
massign()
▼massi……
3917staticVALUE3918massign(self,node,val,pcall)3919VALUEself;3920NODE*node;3921VALUEval;3922intpcall;3923{
(eval.c)
I’msorrythisishalfway,butI’dlikeyoutostopandpayattentiontothe4thargument.pcallisProcCALL,thisindicateswhetherornotthefunctionisusedtocallProcobject.BetweenProccallsandtheothersthere’salittledifferenceinthestrictnessofthecheckofthemultipleassignments,soaflagisreceivedtocheck.Obviously,thevalueisdecidedtobeeither0or1.
Then,I’dlikeyoutolookatthepreviouscodecallingmassign(),itwaspcall=0.Therefore,weprobablydon’tmindifassumingitispcall=0forthetimebeingandextractingthevariables.Thatis,whenthere’sanargumentlikepcallwhichisslightlychangingthebehavior,wealwaysneedtoconsiderthetwopatternsofscenarios,soitisreallycumbersome.Ifthere’sonlyoneactualfunctionmassign(),tothinkasifthereweretwofunctions,pcall=0andpcall=1,iswaysimplertoread.
Whenwritingaprogramwemustavoidduplicationsasmuchaspossible,butthisprincipleisunrelatedifitiswhenreading.Ifpatternsarelimited,copyingitandlettingittoberedundantisrathertherightapproach.Therearewordings“optimizeforspeed”“optimizeforthecodesize”,inthiscasewe’ll“optimizeforreadability”.
So,assumingitispcall=0andcuttingthecodesasmuchaspossibleandthefinalappearanceisshownasfollows:
▼massign()(simplified)
staticVALUEmassign(self,node,val/*,pcall=0*/)VALUEself;NODE*node;VALUEval;{NODE*list;longi=0,len;
val=svalue_to_mvalue(val);len=RARRAY(val)->len;list=node->nd_head;/*(A)*/for(i=0;list&&i<len;i++){assign(self,list->nd_head,RARRAY(val)->ptr[i],pcall);list=list->nd_next;}/*(B)*/if(node->nd_args){if(node->nd_args==(NODE*)-1){/*nocheckformere`*'*/}elseif(!list&&i<len){assign(self,node->nd_args,rb_ary_new4(len-i,RARRAY(val)->ptr+i),pcall);}else{assign(self,node->nd_args,rb_ary_new2(0),pcall);}}
/*(C)*/while(list){i++;assign(self,list->nd_head,Qnil,pcall);list=list->nd_next;}returnval;}
valistheright-handsidevalue.Andthere’sthesuspiciousconversioncalledsvalue_to_mvalue(),sincemvalue_to_svalue()appearedpreviouslyandsvalue_to_mvalue()inthistime,soyoucaninfer“itmustbegettingback”.((errata:itwasavalue_to_svalue()inthepreviouscase.Therefore,it’shardtoinfer“gettingback”,butyoucanignorethemanyway.))Thus,thebotharedeleted.Inthenextline,sinceitusesRARRAY(),youcaninferthattheright-handsidevalueisanArrayofRuby.Meanwhile,theleft-handsideisnode->nd_head,soitisthevalueassignedtothelocalvariablelist.Thislistisalsoanode(NODE_ARRAY).
We’lllookatthecodebyclause.
(A)assignis,asthenamesuggests,afunctiontoperformanone-to-oneassignment.Sincetheleft-handsideisexpressedbyanode,ifitis,forinstance,NODE_IASGN(anassignmenttoaninstancevariable),itassignswithrb_ivar_set().So,whatitisdoinghereisadjustingtoeitherlistandvalwhichisshorteranddoingone-to-oneassignments.(Fig.9)
Fig.9.assignwhencorresponded
(B)ifthereareremaindersontheright-handside,turnthemintoa
Rubyarrayandassignitinto(theleft-handsideexpressedby)thenode->nd_args.
(C)ifthereareremaindersontheleft-handside,assignniltoallofthem.
Bytheway,theprocedurewhichisassumingpcall=0thencuttingoutisverysimilartothedataflowanalytics/constantfoldingsusedontheoptimizationphaseofcompilers.Therefore,wecanprobablyautomateittosomeextent.
TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera
CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License
RubyHackingGuide
Chapter15:Methods
Inthischapter,I’lltalkaboutmethodsearchingandinvoking.
Searchingmethods
TerminologyInthischapter,bothmethodcallsandmethoddefinitionsarediscussed,andtherewillappearreallyvarious“arguments”.Therefore,tomakeitnotconfusing,let’sstrictlydefinetermshere:
m(a)#aisa"normalargument"m(*list)#listisan"arrayargument"m(&block)#blockisa"blockargument"
defm(a)#aisa"normalparameter"defm(a=nil)#aisan"optionparameter",nilis"itdefaultvalue".defm(*rest)#restisa"restparameter"defm(&block)#blockisa"blockparameter"
Inshort,theyareall“arguments”whenpassingand“parameters”whenreceiving,andeachadjectiveisattachedaccordingtoitstype.
However,amongtheabovethings,the“blockarguments”andthe“blockparameters”willbediscussedinthenextchapter.
Investigation▼TheSourceProgram
obj.method(7,8)
▼ItsSyntaxTree
NODE_CALLnd_mid=9049(method)nd_recv:NODE_VCALLnd_mid=9617(obj)nd_args:NODE_ARRAY[0:NODE_LITnd_lit=7:Fixnum1:NODE_LITnd_lit=8:Fixnum]
ThenodeforamethodcallisNODE_CALL.Thend_argsholdstheargumentsasalistofNODE_ARRAY.
Additionally,asthenodesformethodcalls,therearealsoNODE_FCALLandNODE_VCALL.NODE_FCALLisforthe“method(args)”form,NODE_VCALLcorrespondstomethodcallsinthe“method”formthatisthesameformasthelocalvariables.FCALLandVCALLcouldactuallybeintegratedintoone,butbecausethere’snoneedtoprepareargumentswhenitisVCALL,theyareseparatedfromeachotheronlyinordertosavebothtimesandmemoriesforit.
Now,let’slookatthehandlerofNODE_CALLinrb_eval().
▼rb_eval()−NODE_CALL
2745caseNODE_CALL:2746{2747VALUErecv;2748intargc;VALUE*argv;/*usedinSETUP_ARGS*/2749TMP_PROTECT;27502751BEGIN_CALLARGS;2752recv=rb_eval(self,node->nd_recv);2753SETUP_ARGS(node->nd_args);2754END_CALLARGS;27552756SET_CURRENT_SOURCE();2757result=rb_call(CLASS_OF(recv),recv,node->nd_mid,argc,argv,0);2758}2759break;
(eval.c)
Theproblemsareprobablythethreemacros,BEGIN_CALLARGSSETUP_ARGS()END_CALLARGS.Itseemsthatrb_eval()istoevaluatethereceiverandrb_call()istoinvokethemethod,wecanroughlyimaginethattheevaluationoftheargumentsmightbedoneinthethreemacros,butwhatisactuallydone?BEGIN_CALLARGSandEND_CALLARGSaredifficulttounderstandbeforetalkingabouttheiterators,sotheyareexplainedinthenextchapter“Block”.Here,let’sinvestigateonlyaboutSETUP_ARGS().
SETUP_ARGS()
SETUP_ARGS()isthemacrotoevaluatetheargumentsofamethod.
Insideofthismacro,asthecommentintheoriginalprogramsays,thevariablesnamedargcandargvareused,sotheymustbedefinedinadvance.AndbecauseitusesTMP_ALLOC(),itmustuseTMP_PROTECTinadvance.Therefore,somethinglikethefollowingisaboilerplate:
intargc;VALUE*argv;/*usedinSETUP_ARGS*/TMP_PROTECT;
SETUP_ARGS(args_node);
args_nodeis(thenoderepresents)theargumentsofthemethod,turnitintoanarrayofthevaluesobtainedbyevaluatingit,andstoreitinargv.Let’slookatit:
▼SETUP_ARGS()
1780#defineSETUP_ARGS(anode)do{\1781NODE*n=anode;\1782if(!n){\noarguments1783argc=0;\1784argv=0;\1785}\1786elseif(nd_type(n)==NODE_ARRAY){\onlynormalarguments1787argc=n->nd_alen;\1788if(argc>0){\argumentspresent1789inti;\1790n=anode;\1791argv=TMP_ALLOC(argc);\1792for(i=0;i<argc;i++){\1793argv[i]=rb_eval(self,n->nd_head);\1794n=n->nd_next;\1795}\1796}\1797else{\noarguments
1798argc=0;\1799argv=0;\1800}\1801}\1802else{\bothoroneofanarrayargument1803VALUEargs=rb_eval(self,n);\andablockargument1804if(TYPE(args)!=T_ARRAY)\1805args=rb_ary_to_ary(args);\1806argc=RARRAY(args)->len;\1807argv=ALLOCA_N(VALUE,argc);\1808MEMCPY(argv,RARRAY(args)->ptr,VALUE,argc);\1809}\1810}while(0)
(eval.c)
Thisisabitlong,butsinceitclearlybranchesinthreeways,notsoterribleactually.Themeaningofeachbranchiswrittenascomments.
Wedon’thavetocareaboutthecasewithnoarguments,thetworestbranchesaredoingsimilarthings.Roughlyspeaking,whattheyaredoingconsistsofthreesteps:
allocateaspacetostoretheargumentsevaluatetheexpressionsoftheargumentscopythevalueintothevariablespace
IfIwriteinthecode(andtidyupalittle),itbecomesasfollows.
/*****elseifclause、argc!=0*****/inti;n=anode;argv=TMP_ALLOC(argc);/*1*/for(i=0;i<argc;i++){argv[i]=rb_eval(self,n->nd_head);/*2,3*/
n=n->nd_next;}
/*****elseclause*****/VALUEargs=rb_eval(self,n);/*2*/if(TYPE(args)!=T_ARRAY)args=rb_ary_to_ary(args);argc=RARRAY(args)->len;argv=ALLOCA_N(VALUE,argc);/*1*/MEMCPY(argv,RARRAY(args)->ptr,VALUE,argc);/*3*/
TMP_ALLOC()isusedintheelseifside,butALLOCA_N(),whichisordinaryalloca(),isusedintheelseside.Why?Isn’titdangerousintheC_ALLOCAenvironmentbecausealloca()isequivalenttomalloc()?
Thepointisthat“intheelsesidethevaluesofargumentsarealsostoredinargs”.IfIillustrate,itwouldlooklikeFigure1.
Figure1:Beingintheheapisallright.
IfatleastoneVALUEisonthestack,otherscanbesuccessivelymarkedthroughit.ThiskindofVALUEplaysaroletotieuptheotherVALUEstothestacklikeananchor.Namely,itbecomes“anchorVALUE”.Intheelseside,argsistheanchorVALUE.
Foryourinformation,“anchorVALUE”isthewordjustcoinednow.
rb_call()
SETUP_ARGS()isrelativelyoffthetrack.Let’sgobacktothemainline.Thefunctiontoinvokeamethod,itisrb_call().Intheoriginalthere’recodeslikeraisingexceptionswhenitcouldnotfindanything,asusualI’llskipallofthem.
▼rb_call()(simplified)
staticVALUErb_call(klass,recv,mid,argc,argv,scope)VALUEklass,recv;IDmid;intargc;constVALUE*argv;intscope;{NODE*body;intnoex;IDid=mid;structcache_entry*ent;
/*searchovermethodcache*/ent=cache+EXPR1(klass,mid);if(ent->mid==mid&&ent->klass==klass){/*cachehit*/klass=ent->origin;id=ent->mid0;noex=ent->noex;body=ent->method;}else{/*cachemiss,searchingstep-by-step*/body=rb_get_method_body(&klass,&id,&noex);}
/*...checkthevisibility...*/
returnrb_call0(klass,recv,mid,id,argc,argv,body,noex&NOEX_UNDEF);}
Thebasicwayofsearchingmethodswasdiscussedinchapter2:“Object”.Itisfollowingitssuperclassesandsearchingm_tbl.Thisisdonebysearch_method().
Theprincipleiscertainlythis,butwhenitcomestothephasetoexecuteactually,ifitsearchesbylookingupitshashmanytimesforeachmethodcall,itsspeedwouldbetooslow.Toimprovethis,inruby,onceamethodiscalled,itwillbecached.Ifamethodiscalledonce,it’softenimmediatelycalledagain.Thisisknownasanexperientialfactandthiscacherecordsthehighhitrate.
Whatislookingupthecacheisthefirsthalfofrb_call().Onlywith
ent=cache+EXPR1(klass,mid);
thisline,thecacheissearched.We’llexamineitsmechanismindetaillater.
Whenanycachewasnothit,thenextrb_get_method_body()searchestheclasstreestep-by-stepandcachestheresultatthesametime.Figure2showstheentireflowofsearching.
Figure2:MethodSearch
MethodCacheNext,let’sexaminethestructureofthemethodcacheindetail.
▼MethodCache
180#defineCACHE_SIZE0x800181#defineCACHE_MASK0x7ff182#defineEXPR1(c,m)((((c)>>3)^(m))&CACHE_MASK)183184structcache_entry{/*methodhashtable.*/185IDmid;/*method'sid*/186IDmid0;/*method'soriginalid*/187VALUEklass;/*receiver'sclass*/188VALUEorigin;/*wheremethoddefined*/189NODE*method;190intnoex;191};192193staticstructcache_entrycache[CACHE_SIZE];
(eval.c)
IfIdescribethemechanismshortly,itisahashtable.Imentionedthattheprincipleofthehashtableistoconvertatablesearchtoanindexingofanarray.Threethingsarenecessarytoaccomplish:anarraytostorethedata,akey,andahashfunction.
First,thearrayhereisanarrayofstructcache_entry.Andthe
methodisuniquelydeterminedbyonlytheclassandthemethodname,sothesetwobecomethekeyofthehashcalculation.Therestisdonebycreatingahashfunctiontogeneratetheindex(0x000~0x7ff)ofthecachearrayformthekey.ItisEXPR1().Amongitsarguments,cistheclassobjectandmisthemethodname(ID).(Figure3)
Figure3:MethodCache
However,EXPR1()isnotaperfecthashfunctionoranything,soadifferentmethodcangeneratethesameindexcoincidentally.Butbecausethisisnothingmorethanacache,conflictsdonotcauseaproblem.Itjustslowsitsperformancedownalittle.
TheeffectofMethodCacheBytheway,howmucheffectiveisthemethodcacheinactuality?Wecouldnotbeconvincedjustbybeingsaid“itisknownas…”.Let’smeasurebyourselves.
Type Program HitRategeneratingLALRparser raccruby.y 99.9%
generatingamailthread amailer 99.1%generatingadocument rd2htmlrubyrefm.rd 97.8%
Surprisingly,inallofthethreeexperimentsthehitrateismorethan95%.Thisisawesome.Apparently,theeffectof“itisknowas…”isoutstanding.
Invocation
rb_call0()
Therehavebeenmanythingsandfinallywearrivedatthemethodinvoking.However,thisrb_call0()ishuge.Asit’smorethan200lines,itwouldcometo5,6pages.Ifthewholepartislaidouthere,itwouldbedisastrous.Let’slookatitbydividingintosmallportions.Startingwiththeoutline:
▼rb_call0()(Outline)
4482staticVALUE4483rb_call0(klass,recv,id,oid,argc,argv,body,nosuper)4484VALUEklass,recv;4485IDid;4486IDoid;4487intargc;/*OK*/4488VALUE*argv;/*OK*/4489NODE*body;/*OK*/4490intnosuper;4491{4492NODE*b2;/*OK*/4493volatileVALUEresult=Qnil;
4494intitr;4495staticinttick;4496TMP_PROTECT;44974498switch(ruby_iter->iter){4499caseITER_PRE:4500itr=ITER_CUR;4501break;4502caseITER_CUR:4503default:4504itr=ITER_NOT;4505break;4506}45074508if((++tick&0xff)==0){4509CHECK_INTS;/*betterthannothing*/4510stack_check();4511}4512PUSH_ITER(itr);4513PUSH_FRAME();45144515ruby_frame->last_func=id;4516ruby_frame->orig_func=oid;4517ruby_frame->last_class=nosuper?0:klass;4518ruby_frame->self=recv;4519ruby_frame->argc=argc;4520ruby_frame->argv=argv;45214522switch(nd_type(body)){/*...mainprocess...*/46984699default:4700rb_bug("unknownnodetype%d",nd_type(body));4701break;4702}4703POP_FRAME();4704POP_ITER();4705returnresult;4706}
(eval.c)
First,anITERispushedandwhetherornotthemethodisaniteratorisfinallyfixed.AsitsvalueisusedbythePUSH_FRAME()whichcomesimmediatelyafterit,PUSH_ITER()needstoappearbeforehand.PUSH_FRAME()willbediscussedsoon.
AndifIfirstdescribeaboutthe“…mainprocess…”part,itbranchesbasedonthefollowingnodetypesandeachbranchdoesitsinvokingprocess.
NODE_CFUNC methodsdefinedinCNODE_IVAR attr_readerNODE_ATTRSET attr_writerNODE_SUPER superNODE_ZSUPER superwithoutargumentsNODE_DMETHOD invokeUnboundMethodNODE_BMETHOD invokeMethodNODE_SCOPE methodsdefinedinRuby
Someoftheabovenodesarenotexplainedinthisbookbutnotsoimportantandcouldbeignored.TheimportantthingsareonlyNODE_CFUNC,NODE_SCOPEandNODE_ZSUPER.
PUSH_FRAME()
▼PUSH_FRAME()POP_FRAME()
536#definePUSH_FRAME()do{\537structFRAME_frame;\538_frame.prev=ruby_frame;\539_frame.tmp=0;\540_frame.node=ruby_current_node;\
541_frame.iter=ruby_iter->iter;\542_frame.cbase=ruby_frame->cbase;\543_frame.argc=0;\544_frame.argv=0;\545_frame.flags=FRAME_ALLOCA;\546ruby_frame=&_frame
548#definePOP_FRAME()\549ruby_current_node=_frame.node;\550ruby_frame=_frame.prev;\551}while(0)
(eval.c)
First,we’dliketomakesuretheentireFRAMEisallocatedonthestack.Thisisidenticaltomodule_setup().Therestisbasicallyjustdoingordinaryinitializations.
IfIaddonemoredescription,theflagFRAME_ALLOCAindicatestheallocationmethodoftheFRAME.FRAME_ALLOCAobviouslyindicates“itisonthestack”.
rb_call0()–NODE_CFUNCAlotofthingsarewritteninthispartoftheoriginalcode,butmostofthemarerelatedtotrace_funcandsubstantivecodeisonlythefollowingline:
▼rb_call0()−NODE_CFUNC(simplified)
caseNODE_CFUNC:result=call_cfunc(body->nd_cfnc,recv,len,argc,argv);break;
Then,asforcall_cfunc()…
▼call_cfunc()(simplified)
4394staticVALUE4395call_cfunc(func,recv,len,argc,argv)4396VALUE(*func)();4397VALUErecv;4398intlen,argc;4399VALUE*argv;4400{4401if(len>=0&&argc!=len){4402rb_raise(rb_eArgError,"wrongnumberofarguments(%dfor%d)",4403argc,len);4404}44054406switch(len){4407case-2:4408return(*func)(recv,rb_ary_new4(argc,argv));4409break;4410case-1:4411return(*func)(argc,argv,recv);4412break;4413case0:4414return(*func)(recv);4415break;4416case1:4417return(*func)(recv,argv[0]);4418break;4419case2:4420return(*func)(recv,argv[0],argv[1]);4421break;::4475default:4476rb_raise(rb_eArgError,"toomanyarguments(%d)",len);4477break;4478}4479returnQnil;/*notreached*/4480}
(eval.c)
Asshownabove,itbranchesbasedontheargumentcount.Themaximumargumentcountis15.
NotethatneitherSCOPEorVARSispushedwhenitisNODE_CFUNC.ItmakessensebecauseamethoddefinedinCdoesnotuseRuby’slocalvariables.Butitsimultaneouslymeansthatifthe“current”localvariablesareaccessedbyC,theyareactuallythelocalvariablesofthepreviousFRAME.Andinsomeplaces,say,rb_svar(eval.c),itisactuallydone.
rb_call0()–NODE_SCOPENODE_SCOPEistoinvokeamethoddefinedinRuby.ThispartformsthefoundationofRuby.
▼rb_call0()−NODE_SCOPE(outline)
4568caseNODE_SCOPE:4569{4570intstate;4571VALUE*local_vars;/*OK*/4572NODE*saved_cref=0;45734574PUSH_SCOPE();4575/*(A)forwardCREF*/4576if(body->nd_rval){4577saved_cref=ruby_cref;4578ruby_cref=(NODE*)body->nd_rval;4579ruby_frame->cbase=body->nd_rval;4580}
/*(B)initializeruby_scope->local_vars*/4581if(body->nd_tbl){4582local_vars=TMP_ALLOC(body->nd_tbl[0]+1);4583*local_vars++=(VALUE)body;4584rb_mem_clear(local_vars,body->nd_tbl[0]);4585ruby_scope->local_tbl=body->nd_tbl;4586ruby_scope->local_vars=local_vars;4587}4588else{4589local_vars=ruby_scope->local_vars=0;4590ruby_scope->local_tbl=0;4591}4592b2=body=body->nd_next;45934594PUSH_VARS();4595PUSH_TAG(PROT_FUNC);45964597if((state=EXEC_TAG())==0){4598NODE*node=0;4599inti;
/*……(C)assigntheargumentstothelocalvariables……*/
4666if(trace_func){4667call_trace_func("call",b2,recv,id,klass);4668}4669ruby_last_node=b2;/*(D)methodbody*/4670result=rb_eval(recv,body);4671}4672elseif(state==TAG_RETURN){/*backviareturn*/4673result=prot_tag->retval;4674state=0;4675}4676POP_TAG();4677POP_VARS();4678POP_SCOPE();4679ruby_cref=saved_cref;4680if(trace_func){4681call_trace_func("return",ruby_last_node,recv,id,klass);4682}4683switch(state){4684case0:
4685break;46864687caseTAG_RETRY:4688if(rb_block_given_p()){4689JUMP_TAG(state);4690}4691/*fallthrough*/4692default:4693jump_tag_but_local_jump(state);4694break;4695}4696}4697break;
(eval.c)
(A)CREFforwarding,whichwasdescribedatthesectionofconstantsinthepreviouschapter.Inotherwords,cbaseistransplantedtoFRAMEfromthemethodentry.
(B)Thecontenthereiscompletelyidenticaltowhatisdoneatmodule_setup().Anarrayisallocatedatlocal_varsofSCOPE.WiththisandPUSH_SCOPE()andPUSH_VARS(),thelocalvariablescopecreationiscompleted.Afterthis,onecanexecuterb_eval()intheexactlysameenvironmentastheinteriorofthemethod.
(C)Thissetsthereceivedargumentstotheparametervariables.Theparametervariablesareinessenceidenticaltothelocalvariables.ThingssuchasthenumberofargumentsarespecifiedbyNODE_ARGS,allithastodoissettingonebyone.Detailswillbeexplainedsoon.And,
(D)thisexecutesthemethodbody.Obviously,thereceiver(recv)
becomesself.Inotherwords,itbecomesthefirstargumentofrb_eval().Afterall,themethodiscompletelyinvoked.
SetParametersThen,we’llexaminethetotallyskippedpart,whichsetsparameters.Butbeforethat,I’dlikeyoutofirstcheckthesyntaxtreeofthemethodagain.
%ruby-rnodedump-e'defm(a)nilend'NODE_SCOPEnd_rval=(null)nd_tbl=3[_~a]nd_next:NODE_BLOCKnd_head:NODE_ARGSnd_cnt=1nd_rest=-1nd_opt=(null)nd_next:NODE_BLOCKnd_head:NODE_NEWLINEnd_file="-e"nd_nth=1nd_next:NODE_NILnd_next=(null)
NODE_ARGSisthenodetospecifytheparametersofamethod.Iaggressivelydumpedseveralthings,anditseemeditsmembersareusedasfollows:
nd_cnt thenumberofthenormalparameters
nd_rest thevariableIDoftherestparameter.-1iftherestparameterismissing
nd_opt holdsthesyntaxtreetorepresentthedefaultvaluesoftheoptionparameters.alistofNODE_BLOCK
Ifonehasthisamountoftheinformation,thelocalvariableIDforeachparametervariablecanbeuniquelydetermined.First,Imentionedthat0and1arealways$_and$~.In2andlater,thenecessarynumberofordinaryparametersareinline.ThenumberofoptionparameterscanbedeterminedbythelengthofNODE_BLOCK.Againnexttothem,therest-parametercomes.
Forexample,ifyouwriteadefinitionasbelow,
defm(a,b,c=nil,*rest)lvar1=nilend
localvariableIDsareassignedasfollows.
0123456$_$~abcrestlvar1
Areyoustillwithme?Takingthisintoconsiderations,let’slookatthecode.
▼rb_call0()−NODE_SCOPE−assignmentsofarguments
4601if(nd_type(body)==NODE_ARGS){/*nobody*/4602node=body;/*NODE_ARGS*/4603body=0;/*themethodbody*/4604}
4605elseif(nd_type(body)==NODE_BLOCK){/*hasbody*/4606node=body->nd_head;/*NODE_ARGS*/4607body=body->nd_next;/*themethodbody*/4608}4609if(node){/*havesomewhatparameters*/4610if(nd_type(node)!=NODE_ARGS){4611rb_bug("noargument-node");4612}46134614i=node->nd_cnt;4615if(i>argc){4616rb_raise(rb_eArgError,"wrongnumberofarguments(%dfor%d)",4617argc,i);4618}4619if(node->nd_rest==-1){/*norestparameter*//*countingthenumberofparameters*/4620intopt=i;/*thenumberofparameters(iisnd_cnt)*/4621NODE*optnode=node->nd_opt;46224623while(optnode){4624opt++;4625optnode=optnode->nd_next;4626}4627if(opt<argc){4628rb_raise(rb_eArgError,4629"wrongnumberofarguments(%dfor%d)",argc,opt);4630}/*assigningatthesecondtimeinrb_call0*/4631ruby_frame->argc=opt;4632ruby_frame->argv=local_vars+2;4633}46344635if(local_vars){/*hasparameters*/4636if(i>0){/*hasnormalparameters*/4637/*+2toskipthespacesfor$_and$~*/4638MEMCPY(local_vars+2,argv,VALUE,i);4639}4640argv+=i;argc-=i;4641if(node->nd_opt){/*hasoptionparameters*/4642NODE*opt=node->nd_opt;46434644while(opt&&argc){4645assign(recv,opt->nd_head,*argv,1);
4646argv++;argc--;4647opt=opt->nd_next;4648}4649if(opt){4650rb_eval(recv,opt);4651}4652}4653local_vars=ruby_scope->local_vars;4654if(node->nd_rest>=0){/*hasrestparameter*/4655VALUEv;4656/*makeanarrayoftheremainningparametersandassignittoavariable*/4657if(argc>0)4658v=rb_ary_new4(argc,argv);4659else4660v=rb_ary_new2(0);4661ruby_scope->local_vars[node->nd_rest]=v;4662}4663}4664}
(eval.c)
Sincecommentsareaddedmorethanbefore,youmightbeabletounderstandwhatitisdoingbyfollowingstep-by-step.
OnethingI’dliketomentionisaboutargcandargvofruby_frame.Itseemstobeupdatedonlywhenanyrest-parameterdoesnotexist,whyisitonlywhenanyrest-parameterdoesnotexist?
Thispointcanbeunderstoodbythinkingaboutthepurposeofargcandargv.Thesemembersactuallyexistforsuperwithoutarguments.Itmeansthefollowingform:
super
Thissuperhasabehaviortodirectlypasstheparametersofthecurrentlyexecutingmethod.Toenabletopassatthemoment,theargumentsaresavedinruby_frame->argv.
Goingbacktothepreviousstoryhere,ifthere’sarest-parameter,passingtheoriginalparameterslistsomehowseemsmoreconvenient.Ifthere’snot,theoneafteroptionparametersareassignedseemsbetter.
defm(a,b,*rest)super#probably5,6,7,8shouldbepassedendm(5,6,7,8)
defm(a,b=6)super#probably5,6shouldbepassedendm(5)
Thisisaquestionofwhichisbetterasaspecificationratherthan“itmustbe”.Ifamethodhasarest-parameter,itsupposedtoalsohavearest-parameteratsuperclass.Thus,ifthevalueafterprocessedispassed,there’sthehighpossibilityofbeinginconvenient.
Now,I’vesaidvariousthings,butthestoryofmethodinvocationisalldone.Therestis,astheendingofthischapter,lookingattheimplementationofsuperwhichisjustdiscussed.
super
WhatcorrespondstosuperareNODE_SUPERandNODE_ZSUPER.NODE_SUPERisordinarysuper,andNODE_ZSUPERissuperwithoutarguments.
▼rb_eval()−NODE_SUPER
2780caseNODE_SUPER:2781caseNODE_ZSUPER:2782{2783intargc;VALUE*argv;/*usedinSETUP_ARGS*/2784TMP_PROTECT;2785/*(A)casewhensuperisforbidden*/2786if(ruby_frame->last_class==0){2787if(ruby_frame->orig_func){2788rb_name_error(ruby_frame->last_func,2789"superclassmethod`%s'disabled",2790rb_id2name(ruby_frame->orig_func));2791}2792else{2793rb_raise(rb_eNoMethodError,"supercalledoutsideofmethod");2794}2795}/*(B)setuporevaluateparameters*/2796if(nd_type(node)==NODE_ZSUPER){2797argc=ruby_frame->argc;2798argv=ruby_frame->argv;2799}2800else{2801BEGIN_CALLARGS;2802SETUP_ARGS(node->nd_args);2803END_CALLARGS;2804}2805/*(C)yetmysteriousPUSH_ITER()*/2806PUSH_ITER(ruby_iter->iter?ITER_PRE:ITER_NOT);2807SET_CURRENT_SOURCE();2808result=rb_call(RCLASS(ruby_frame->last_class)->super,2809ruby_frame->self,ruby_frame->orig_func,
2810argc,argv,3);2811POP_ITER();2812}2813break;
(eval.c)
Forsuperwithoutarguments,Isaidthatruby_frame->argvisdirectlyusedasarguments,thisisdirectlyshownat(B).
(C)justbeforecallingrb_call(),doingPUSH_ITER().Thisisalsowhatcannotbeexplainedindetail,butinthiswaytheblockpassedtothecurrentmethodcanbehandedovertothenextmethod(meaning,themethodofsuperclassthatisgoingtobecalled).
Andfinally,(A)whenruby_frame->last_classis0,callingsuperseemsforbidden.Sincetheerrormessagesays“mustbeenabledbyrb_enable_super()”,itseemsitbecomescallablebycallingrb_enable_super().((errata:Theerrormessage“mustbeenabledbyrb_enable_super()”existsnotinthislistbutinrb_call_super().))Why?
First,Ifweinvestigateinwhatkindofsituationlast_classbecomes0,itseemsthatitiswhileexecutingthemethodwhosesubstanceisdefinedinC(NODE_CFUNC).Moreover,itisthesamewhendoingaliasorreplacingsuchmethod.
I’veunderstooduntilthere,buteventhoughreadingsourcecodes,Icouldn’tunderstandthesubsequentsofthem.BecauseIcouldn’t,
Isearched“rb_enable_super”overtheruby’smailinglistarchivesandfoundit.Accordingtothatmail,thesituationlookslikeasfollows:
Forexample,there’samethodnamedString.new.Ofcourse,thisisamethodtocreateastring.String.newcreatesastructofT_STRING.Therefore,youcanexpectthatthereceiverisalwaysofT_STRINGwhenwritinganinstancemethodsofString.
Then,superofString.newisObject.new.Object.newcreateastructofT_OBJECT.WhathappensifString.newisreplacedbynewdefinitionandsuperiscalled?
defString.newsuperend
Asaconsequence,anobjectwhosestructisofT_OBJECTbutwhoseclassisStringiscreated.However,amethodofStringiswrittenwithexpectationofastructofT_STRING,sonaturallyitdowns.
Howcanweavoidthis?Theansweristoforbidtocallanymethodexpectingastructofadifferentstructtype.Buttheinformationof“expectingstructtype”isnotattachedtomethod,andalsonottoclass.Forexample,ifthere’sawaytoobtainT_STRINGfromStringclass,itcanbecheckedbeforecalling,butcurrentlywecan’tdosuchthing.Therefore,asthesecond-bestplan,“superfrommethodsdefinedinCisforbidden”isdefined.Inthisway,ifthelayerofmethodsatClevelispreciselycreated,itcannotbegot
downatleast.And,whenthecaseis“It’sabsolutelysafe,soallowsuper”,supercanbeenabledbycallingrb_enable_super().
Inshort,theheartoftheproblemismissmatchofstructtypes.Thisisthesameastheproblemthatoccursattheallocationframework.
Then,howtosolvethisistosolvetherootoftheproblemthat“theclassdoesnotknowthestruct-typeoftheinstance”.But,inordertoresolvethis,atleastnewAPIisnecessary,andifdoingmoredeeply,compatibilitywillbelost.Therefore,forthetimebeing,thefinalsolutionhasnotdecidedyet.
TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera
CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License
RubyHackingGuide
Chapter16:Blocks
Iterator
Inthischapter,BLOCK,whichisthelastbignameamongthesevenRubystacks,comesin.Afterfinishingthis,theinternalstateoftheevaluatorisvirtuallyunderstood.
TheWholePictureWhatisthemechanismofiterators?First,let’sthinkaboutasmallprogramasbelow:
▼TheSourceProgram
iter_method()do9#amarktofindthisblockend
Let’scheckthetermsjustincase.Asforthisprogram,iter_methodisaniteratormethod,do~endisaniteratorblock.Hereisthesyntaxtreeofthisprogrambeingdumped.
▼ItsSyntaxTree
NODE_ITERnd_iter:NODE_FCALLnd_mid=9617(iter_method)nd_args=(null)nd_var=(null)nd_body:NODE_LITnd_lit=9:Fixnum
Lookingfortheblockbyusingthe9writtenintheiteratorblockasatrace,wecanunderstandthatNODE_ITERseemstorepresenttheiteratorblock.AndNODE_FCALLwhichcallsiter_methodisatthe“below”ofthatNODE_ITER.Inotherwords,thenodeofiteratorblockappearsearlierthanthecalloftheiteratormethod.Thismeans,beforecallinganiteratormethod,ablockispushedatanothernode.
Andcheckingbyfollowingtheflowofcodewithdebugger,Ifoundthattheinvocationofaniteratorisseparatedinto3steps:NODE_ITERNODE_CALLandNODE_YIELD.Thismeans,
1. pushablock(NODE_ITER)2. callthemethodwhichisaniterator(NODE_CALL)3. yield(NODE_YEILD)
that’sall.
PushablockFirst,let’sstartwiththefirststep,thatisNODE_ITER,whichisthe
nodetopushablock.
▼rb_eval()−NODE_ITER(simplified)
caseNODE_ITER:{iter_retry:PUSH_TAG(PROT_FUNC);PUSH_BLOCK(node->nd_var,node->nd_body);
state=EXEC_TAG();if(state==0){PUSH_ITER(ITER_PRE);result=rb_eval(self,node->nd_iter);POP_ITER();}elseif(_block.tag->dst==state){state&=TAG_MASK;if(state==TAG_RETURN||state==TAG_BREAK){result=prot_tag->retval;}}POP_BLOCK();POP_TAG();switch(state){case0:break;
caseTAG_RETRY:gotoiter_retry;
caseTAG_BREAK:break;
caseTAG_RETURN:return_value(result);/*fallthrough*/default:JUMP_TAG(state);}}
break;
Sincetheoriginalcodecontainsthesupportoftheforstatement,itisdeleted.Afterremovingthecoderelatingtotags,thereareonlypush/popofITERandBLOCKleft.Becausetherestisordinarilydoingrb_eval()withNODE_FCALL,theseITERandBLOCKarethenecessaryconditionstoturnamethodintoaniterator.
ThenecessityofpushingBLOCKisfairlyreasonable,butwhat’sITERfor?Actually,tothinkaboutthemeaningofITER,youneedtothinkfromtheviewpointofthesidethatusesBLOCK.
Forexample,supposeamethodisjustcalled.Andruby_blockexists.ButsinceBLOCKispushedregardlessofthebreakofmethodcalls,theexistenceofablockdoesnotmeantheblockispushedforthatmethod.It’spossiblethattheblockispushedforthepreviousmethod.(Figure1)
Figure1:noone-to-onecorrespondencebetweenFRAMEandBLOCK
So,inordertodetermineforwhichmethodtheblockispushed,
ITERisused.BLOCKisnotpushedforeachFRAMEbecausepushingBLOCKisalittleheavy.Howmuchheavyis,let’scheckitinpractice.
PUSH_BLOCK()
TheargumentofPUSH_BLOCK()is(thesyntaxtreeof)theblockparameterandtheblockbody.
▼PUSH_BLOCK()POP_BLOCK()
592#definePUSH_BLOCK(v,b)do{\593structBLOCK_block;\594_block.tag=new_blktag();\595_block.var=v;\596_block.body=b;\597_block.self=self;\598_block.frame=*ruby_frame;\599_block.klass=ruby_class;\600_block.frame.node=ruby_current_node;\601_block.scope=ruby_scope;\602_block.prev=ruby_block;\603_block.iter=ruby_iter->iter;\604_block.vmode=scope_vmode;\605_block.flags=BLOCK_D_SCOPE;\606_block.dyna_vars=ruby_dyna_vars;\607_block.wrapper=ruby_wrapper;\608ruby_block=&_block
610#definePOP_BLOCK()\611if(_block.tag->flags&(BLOCK_DYNAMIC))\612_block.tag->flags|=BLOCK_ORPHAN;\613elseif(!(_block.scope->flags&SCOPE_DONT_RECYCLE))\614rb_gc_force_recycle((VALUE)_block.tag);\615ruby_block=_block.prev;\616}while(0)
(eval.c)
Let’smakesurethataBLOCKis“thesnapshotoftheenvironmentofthemomentofcreation”.Asaproofofit,exceptforCREFandBLOCK,thesixstackframesaresaved.CREFcanbesubstitutedbyruby_frame->cbase,there’snoneedtopush.
And,I’dliketocheckthethreepointsaboutthemechanismofpush.BLOCKisfullyallocatedonthestack.BLOCKcontainsthefullcopyofFRAMEatthemoment.BLOCKisdifferentfromtheothermanystackframestructsinhavingthepointertothepreviousBLOCK(prev).
TheflagsusedinvariouswaysatPOP_BLOCK()isnotexplainednowbecauseitcanonlybeunderstoodafterseeingtheimplementationofProclater.
Andthetalkisabout“BLOCKisheavy”,certainlyitseemsalittleheavy.Whenlookinginsideofnew_blktag(),wecanseeitdoesmalloc()andstoreplentyofmembers.Butlet’sdeferthefinaljudgeuntilafterlookingatandcomparingwithPUSH_ITER().
PUSH_ITER()
▼PUSH_ITER()POP_ITER()
773#definePUSH_ITER(i)do{\774structiter_iter;\775_iter.prev=ruby_iter;\776_iter.iter=(i);\777ruby_iter=&_iter
779#definePOP_ITER()\780ruby_iter=_iter.prev;\781}while(0)
(eval.c)
Onthecontrary,thisisapparentlylight.Itonlyusesthestackspaceandhasonlytwomembers.EvenifthisispushedforeachFRAME,itwouldprobablymatterlittle.
IteratorMethodCallAfterpushingablock,thenextthingistocallaniteratormethod(amethodwhichisaniterator).Therealsoneedsalittlemachinery.Doyourememberthatthere’sacodetomodifythevalueofruby_iteratthebeginningofrb_call0?Here.
▼rb_call0()−movingtoITER_CUR
4498switch(ruby_iter->iter){4499caseITER_PRE:4500itr=ITER_CUR;4501break;4502caseITER_CUR:4503default:4504itr=ITER_NOT;4505break;4506}
(eval.c)
SinceITER_PREispushedpreviouslyatNODE_TER,thiscodemakesruby_iterITER_CUR.Atthismoment,amethodfinally“becomes”an
iterator.Figure2showsthestateofthestacks.
Figure2:thestateoftheRubystacksonaniteratorcall.
Thepossiblevalueofruby_iterisnottheoneoftwobooleanvalues(forthatmethodornot),butoneofthreestepsbecausethere’salittlegapbetweenthetimingswhenpushingablockandinvokinganiteratormethod.Forexample,there’stheevaluationoftheargumentsofaniteratormethod.Sinceit’spossiblethatitcontainsmethodcallsinsideit,there’sthepossibilitythatoneofthatmethodsmistakenlythinksthatthejustpushedblockisforitselfandusesitduringtheevaluation.Therefore,thetimingwhenamethodbecomesaniterator,thismeansturningintoITER_CUR,hastobetheplaceinsideofrb_call()thatisjustbeforefinishingtheinvocation.
▼theprocessingorder
method(arg){block}#pushablock
method(arg){block}#evaluatethearuguments
method(arg){block}#amethodcall
Forexample,inthelastchapter“Method”,there’samacronamedBEGIN_CALLARGSatahandlerofNODE_CALL.ThisiswheremakinguseofthethirdstepITER.Let’sgobackalittleandtrytoseeit.
BEGIN_CALLARGSEND_CALLARGS
▼BEGIN_CALLARGSEND_CALLARGS
1812#defineBEGIN_CALLARGSdo{\1813structBLOCK*tmp_block=ruby_block;\1814if(ruby_iter->iter==ITER_PRE){\1815ruby_block=ruby_block->prev;\1816}\1817PUSH_ITER(ITER_NOT)
1819#defineEND_CALLARGS\1820ruby_block=tmp_block;\1821POP_ITER();\1822}while(0)
(eval.c)
Whenruby_iterisITER_PRE,aruby_blockissetaside.Thiscodeisimportant,forinstance,inthebelowcase:
obj.m1{yield}.m2{nil}
Theevaluationorderofthisexpressionis:
1. pushtheblockofm22. pushtheblockofm1
3. callthemethodm14. callthemethodm2
Therefore,iftherewasnotBEGIN_CALLARGS,m1willcalltheblockofm2.
And,ifthere’sonemoreiteratorconnected,thenumberofBEGIN_CALLARGSincreasesatthesametimeinthiscase,sothere’snoproblem.
BlockInvocationThethirdphaseofiteratorinvocation,itmeansthelastphase,isblockinvocation.
▼rb_eval()−NODE_YIELD
2579caseNODE_YIELD:2580if(node->nd_stts){2581result=avalue_to_yvalue(rb_eval(self,node->nd_stts));2582}2583else{2584result=Qundef;/*noarg*/2585}2586SET_CURRENT_SOURCE();2587result=rb_yield_0(result,0,0,0);2588break;
(eval.c)
nd_sttsistheparameterofyield.avalue_to_yvalue()wasmentionedalittleatthemultipleassignments,butyoucanignorethis.
((errata:actually,itwasnotmentioned.Youcanignorethisanyway.))Theheartofthebehaviorisnotthisbutrb_yield_0().Sincethisfunctionisalsoverylong,Ishowthecodeafterextremelysimplifyingit.Mostofthemethodstosimplifyarepreviouslyused.
cutthecodesrelatingtotrace_func.cuterrorscutthecodesexistonlytopreventfromGCAsthesameasmassign(),there’stheparameterpcall.Thisparameteristochangethelevelofrestrictionoftheparametercheck,sonotimportanthere.Therefore,assumepcal=0andperformconstantfoldings.
Andthistime,Iturnonthe“optimizeforreadabilityoption”asfollows.
whenacodebranchinghasequivalentkindofbranches,leavethemainoneandcuttherest.ifaconditionistrue/falseinthealmostallcase,assumeitistrue/false.assumethere’snotagjumpoccurs,deleteallcodesrelatingtotag.
Ifthingsaredoneuntilthis,itbecomesveryshorter.
▼rb_yield_0()(simplified)
staticVALUE
rb_yield_0(val,self,klass,/*pcall=0*/)VALUEval,self,klass;{volatileVALUEresult=Qnil;volatileVALUEold_cref;volatileVALUEold_wrapper;structBLOCK*volatileblock;structSCOPE*volatileold_scope;structFRAMEframe;intstate;
PUSH_VARS();PUSH_CLASS();block=ruby_block;frame=block->frame;frame.prev=ruby_frame;ruby_frame=&(frame);old_cref=(VALUE)ruby_cref;ruby_cref=(NODE*)ruby_frame->cbase;old_wrapper=ruby_wrapper;ruby_wrapper=block->wrapper;old_scope=ruby_scope;ruby_scope=block->scope;ruby_block=block->prev;ruby_dyna_vars=new_dvar(0,0,block->dyna_vars);ruby_class=block->klass;self=block->self;
/*settheblockarguments*/massign(self,block->var,val,pcall);
PUSH_ITER(block->iter);/*executetheblockbody*/result=rb_eval(self,block->body);POP_ITER();
POP_CLASS();/*……collectruby_dyna_vars……*/POP_VARS();ruby_block=block;ruby_frame=ruby_frame->prev;ruby_cref=(NODE*)old_cref;ruby_wrapper=old_wrapper;
ruby_scope=old_scope;
returnresult;}
Asyoucansee,themoststackframesarereplacedwithwhatsavedatruby_block.Thingstosimplesave/restoreareeasytounderstand,solet’sseethehandlingoftheotherframesweneedtobecarefulabout.
FRAME
structFRAMEframe;
frame=block->frame;/*copytheentirestruct*/frame.prev=ruby_frame;/*bythesetwolines……*/ruby_frame=&(frame);/*……frameispushed*/
Differingfromtheotherframes,aFRAMEisnotusedinthesavedstate,butanewFRAMEiscreatedbyduplicating.ThiswouldlooklikeFigure3.
Figure3:pushacopiedframe
Aswe’veseenthecodeuntilhere,itseemsthatFRAMEwillneverbe“reused”.WhenpushingFRAME,anewFRAMEwillalwaysbecreated.
BLOCK
block=ruby_block;:ruby_block=block->prev;:ruby_block=block;
WhatisthemostmysteriousisthisbehaviorofBLOCK.Wecan’teasilyunderstandwhetheritissavingorpopping.It’scomprehensiblethatthefirststatementandthethirdstatementareasapair,andthestatewillbeeventuallyback.However,whatistheconsequenceofthesecondstatement?
ToputtheconsequenceofI’veponderedalotinonephrase,“goingbacktotheruby_blockofatthemomentwhenpushingtheblock”.Aniteratoris,inshort,thesyntaxtogobacktothepreviousframe.Therefore,allwehavetodoisturningthestateofthestackframeintowhatwasatthemomentwhencreatingtheblock.And,thevalueofruby_blockatthemomentwhencreatingtheblockis,itseemscertainthatitwasblock->prev.Therefore,itiscontainedinprev.
Additionally,forthequestion“isitnoproblemtoassumewhatinvokedisalwaysthetopofruby_block?”,there’snochoicebutsaying“astherb_yield_0side,youcanassumeso”.Topushthe
blockwhichshouldbeinvokedonthetopoftheruby_blockistheworkofthesidetopreparetheblock,andnottheworkofrb_yield_0.
AnexampleofitisBEGIN_CALLARGSwhichwasdiscussedinthepreviouschapter.Whenaniteratorcallcascades,thetwoblocksarepushedandthetopofthestackwillbetheblockwhichshouldnotbeused.Therefore,itispurposefullycheckedandsetaside.
VARS
Cometothinkofit,IthinkwehavenotlookedthecontentsofPUSH_VARS()andPOP_VARS()yet.Let’sseethemhere.
▼PUSH_VARS()POP_VARS()
619#definePUSH_VARS()do{\620structRVarmap*volatile_old;\621_old=ruby_dyna_vars;\622ruby_dyna_vars=0
624#definePOP_VARS()\625if(_old&&(ruby_scope->flags&SCOPE_DONT_RECYCLE)){\626if(RBASIC(_old)->flags)/*ifwerenotrecycled*/\627FL_SET(_old,DVAR_DONT_RECYCLE);\628}\629ruby_dyna_vars=_old;\630}while(0)
(eval.c)
Thisisalsonotpushinganewstruct,tosay“setaside/restore”iscloser.Inpractice,inrb_yield_0,PUSH_VARS()isusedonlytoset
asidethevalue.Whatactuallypreparesruby_dyna_varsisthisline.
ruby_dyna_vars=new_dvar(0,0,block->dyna_vars);
Thistakesthedyna_varssavedinBLOCKandsetsit.Anentryisattachedatthesametime.I’dlikeyoutorecallthedescriptionofthestructureofruby_dyna_varsinPart2,itsaidtheRVarmapwhoseidis0suchastheonecreatedhereisusedasthebreakbetweenblockscopes.
However,infact,betweentheparserandtheevaluator,theformofthelinkstoredinruby_dyna_varsisslightlydifferent.Let’slookatthedvar_asgn_curr()function,whichassignsablocklocalvariableatthecurrentblock.
▼dvar_asgn_curr()
737staticinlinevoid738dvar_asgn_curr(id,value)739IDid;740VALUEvalue;741{742dvar_asgn_internal(id,value,1);743}
699staticvoid700dvar_asgn_internal(id,value,curr)701IDid;702VALUEvalue;703intcurr;704{705intn=0;706structRVarmap*vars=ruby_dyna_vars;707
708while(vars){709if(curr&&vars->id==0){710/*firstnullisadvarheader*/711n++;712if(n==2)break;713}714if(vars->id==id){715vars->val=value;716return;717}718vars=vars->next;719}720if(!ruby_dyna_vars){721ruby_dyna_vars=new_dvar(id,value,0);722}723else{724vars=new_dvar(id,value,ruby_dyna_vars->next);725ruby_dyna_vars->next=vars;726}727}
(eval.c)
Thelastifstatementistoaddavariable.Ifwefocusonthere,wecanseealinkisalwayspushedinatthe“next”toruby_dyna_vars.Thismeans,itwouldlooklikeFigure4.
Figure4:thestructureofruby_dyna_vars
Thisdiffersfromthecaseoftheparserinonepoint:theheaders(id=0)toindicatethebreaksofscopesareattachedbeforethelinks.Ifaheaderisattachedafterthelinks,thefirstoneofthescopecannotbeinsertedproperly.(Figure5)((errata:Itwasdescribedthatruby_dyna_varsoftheevaluatoralwaysformsasinglestraightlink.Butaccordingtotheerrata,itwaswrong.Thatpartandrelevantdescriptionsareremoved.))
Figure5:Theentrycannotbeinsertedproperly.
TargetSpecifiedJumpThecoderelatestojumptagsareomittedinthepreviouslyshowncode,butthere’saneffortthatwe’veneverseenbeforeinthejumpofrb_yield_0.Whyistheeffortnecessary?I’lltellthereasoninadvance.I’dlikeyoutoseethebelowprogram:
[0].eachdobreakend#theplacetoreachbybreak
likethisway,inthecasewhendoingbreakfrominsideofablock,itisnecessarytogetoutoftheblockandgotothemethodthatpushedtheblock.Whatdoesitactuallymean?Let’sthinkbylookingatthe(dynamic)callgraphwheninvokinganiterator.
rb_eval(NODE_ITER)....catch(TAG_BREAK)rb_eval(NODE_CALL)....catch(TAG_BREAK)rb_eval(NODE_YIELD)rb_yield_0rb_eval(NODE_BREAK)....throw(TAG_BREAK)
SincewhatpushedtheblockisNODE_ITER,itshouldgobacktoaNODE_ITERwhendoingbreak.However,NODE_CALLiswaitingforTAG_BREAKbeforeNODE_ITER,inordertoturnabreakovermethodsintoanerror.Thisisaproblem.WeneedtosomehowfindawaytogostraightbacktoaNODE_ITER.
Andactually,“goingbacktoaNODE_ITER”willstillbeaproblem.Ifiteratorsarenesting,therecouldbemultipleNODE_ITERs,thusthe
onecorrespondstothecurrentblockisnotalwaysthefirstNODE_ITER.Inotherwords,weneedtorestrictonly“theNODE_ITERthatpushedthecurrentlybeinginvokedblock”
Then,let’sseehowthisisresolved.
▼rb_yield_0()−thepartsrelatestotags
3826PUSH_TAG(PROT_NONE);3827if((state=EXEC_TAG())==0){/*……evaluatethebody……*/3838}3839else{3840switch(state){3841caseTAG_REDO:3842state=0;3843CHECK_INTS;3844gotoredo;3845caseTAG_NEXT:3846state=0;3847result=prot_tag->retval;3848break;3849caseTAG_BREAK:3850caseTAG_RETURN:3851state|=(serial++<<8);3852state|=0x10;3853block->tag->dst=state;3854break;3855default:3856break;3857}3858}3859POP_TAG();
(eval.c)
ThepartsofTAG_BREAKandTAG_RETURNarecrucial.
First,serialisastaticvariableofrb_yield_0(),itsvaluewillbedifferenteverytimecallingrb_yield_0.“serial”istheserialof“serialnumber”.
Thereasonwhyleftshiftingby8bitsseemsinordertoavoidoverlappingthevaluesofTAG_xxxx.TAG_xxxxisintherangebetween0x1~0x8,4bitsareenough.And,thebit-orof0x10seemstopreventserialfromoverflow.In32-bitmachine,serialcanuseonly24bits(only16milliontimes),recentmachinecanletitoverflowwithinlessthan10seconds.Ifthishappens,thetop24bitsbecomeall0inline.Therefore,if0x10didnotexist,statewouldbethesamevalueasTAG_xxxx(SeealsoFigure6).
Figure6:block->tag->dst
Now,tag->dstbecamethevaluewhichdiffersfromTAG_xxxxandisuniqueforeachcall.Inthissituation,becauseanordinaryswitchaspreviousonescannotreceiveit,thesidetostopjumpsshouldneedeffortstosomeextent.Theplacewheremakinganeffortisthisplaceofrb_eval:NODE_ITER:
▼rb_eval()−NODE_ITER(tostopjumps)
caseNODE_ITER:{state=EXEC_TAG();if(state==0){/*……invokeaniterator……*/}elseif(_block.tag->dst==state){state&=TAG_MASK;if(state==TAG_RETURN||state==TAG_BREAK){result=prot_tag->retval;}}}
IncorrespondingNODE_ITERandrb_yield_0,blockshouldpointtothesamething,sotag->dstwhichwassetatrb_yield_0comesinhere.Becauseofthis,onlythecorrespondingNODE_ITERcanproperlystopthejump.
CheckofablockWhetherornotacurrentlybeingevaluatedmethodisaniterator,inotherwords,whetherthere’sablock,canbecheckedbyrb_block_given_p().Afterreadingtheaboveall,wecantellitsimplementation.
▼rb_block_given_p()
3726int3727rb_block_given_p()3728{3729if(ruby_frame->iter&&ruby_block)3730returnQtrue;3731returnQfalse;
3732}
(eval.c)
Ithinkthere’snoproblem.WhatI’dliketotalkaboutthistimeisactuallyanotherfunctiontocheck,itisrb_f_block_given_p().
▼rb_f_block_given_p()
3740staticVALUE3741rb_f_block_given_p()3742{3743if(ruby_frame->prev&&ruby_frame->prev->iter&&ruby_block)3744returnQtrue;3745returnQfalse;3746}
(eval.c)
ThisisthesubstanceofRuby’sblock_given?.Incomparisontorb_block_given_p(),thisisdifferentincheckingtheprevofruby_frame.Whyisthis?
Thinkingaboutthemechanismtopushablock,tocheckthecurrentruby_framelikerb_block_given_p()isright.Butwhencallingblock_given?fromRuby-level,sinceblock_given?itselfisamethod,anextraFRAMEispushed.Hence,weneedtocheckthepreviousone.
Proc
TodescribeaProcobjectfromtheviewpointofimplementing,itis“aBLOCKwhichcanbebringouttoRubylevel”.BeingabletobringouttoRubylevelmeanshavingmorelatitude,butitalsomeanswhenandwhereitwillbeusedbecomescompletelyunpredictable.Focusingonhowtheinfluenceofthisfactis,let’slookattheimplementation.
ProcobjectcreationAProcobjectiscreatedwithProc.new.Itssubstanceisproc_new().
▼proc_new()
6418staticVALUE6419proc_new(klass)6420VALUEklass;6421{6422volatileVALUEproc;6423structBLOCK*data,*p;6424structRVarmap*vars;64256426if(!rb_block_given_p()&&!rb_f_block_given_p()){6427rb_raise(rb_eArgError,"triedtocreateProcobjectwithoutablock");6428}6429/*(A)allocatebothstructRDataandstructBLOCK*/6430proc=Data_Make_Struct(klass,structBLOCK,blk_mark,blk_free,data);6431*data=*ruby_block;64326433data->orig_thread=rb_thread_current();6434data->wrapper=ruby_wrapper;6435data->iter=data->prev?Qtrue:Qfalse;/*(B)theessentialinitializationisfinishedbyhere*/6436frame_dup(&data->frame);
6437if(data->iter){6438blk_copy_prev(data);6439}6440else{6441data->prev=0;6442}6443data->flags|=BLOCK_DYNAMIC;6444data->tag->flags|=BLOCK_DYNAMIC;64456446for(p=data;p;p=p->prev){6447for(vars=p->dyna_vars;vars;vars=vars->next){6448if(FL_TEST(vars,DVAR_DONT_RECYCLE))break;6449FL_SET(vars,DVAR_DONT_RECYCLE);6450}6451}6452scope_dup(data->scope);6453proc_save_safe_level(proc);64546455returnproc;6456}
(eval.c)
ThecreationofaProcobjectitselfisunexpectedlysimple.Between(A)and(B),aspaceforanProcobjectisallocatedanditsinitializationcompletes.Data_Make_Struct()isasimplemacrothatdoesbothmalloc()andData_Wrap_Struct()atthesametime.
Theproblemsexistafterthat:
frame_dup()
blk_copy_prev()
FL_SET(vars,DVAR_DONT_RECYCLE)
scope_dup()
Thesefourhavethesamepurposes.Theyare:
moveallofwhatwereputonthemachinestacktotheheap.preventfromcollectingevenifafterPOP
Here,“all”meanstheallthingsincludingprev.Fortheallstackframespushedthere,itduplicateseachframebydoingmalloc()andcopying.VARSisusuallyforcedtobecollectedbyrb_gc_force_recycle()atthesamemomentofPOP,butthisbehaviorisstoppedbysettingtheDVAR_DONT_RECYCLEflag.Andsoon.Reallyextremethingsaredone.
Whyaretheseextremethingsnecessary?Thisisbecause,unlikeiteratorblocks,aProccanpersistlongerthanthemethodthatcreatedit.AndtheendofamethodmeansthethingsallocatedonthemachinestacksuchasFRAME,ITER,andlocal_varsofSCOPEareinvalidated.It’seasytopredictwhattheconsequenceofusingtheinvalidatedmemories.(Anexampleanswer:itbecomestroublesome).
ItriedtocontriveawaytoatleastusethesameFRAMEfrommultipleProc,butsincetherearetheplacessuchasold_framewheresettingasidethepointerstothelocalvariables,itdoesnotseemgoingwell.Ifitrequiresaloteffortsinanyway,anothereffort,say,allocatingallofthemwithmalloc()fromthefristplace,seemsbettertogiveitatry.
Anyway,Isentimentallythinkthatit’ssurprisingthatitrunswiththatspeedeventhoughdoingtheseextremethings.Indeed,ithasbecomeagoodtime.
FloatingFramePreviously,Imentioneditjustinonephrase“duplicateallframes”,butsincethatwasunclear,let’slookatmoredetails.Thepointsarethenexttwo:
HowtoduplicateallWhyallofthemareduplicated
Thenfirst,let’sstartwiththesummaryofhoweachstackframeissaved.
Frame location hasprevpointer?FRAME stack yesSCOPE stack nolocal_tbl heaplocal_vars stackVARS heap noBLOCK stack yes
CLASSCREFITERarenotnecessarythistime.SinceCLASSisageneralRubyobject,rb_gc_force_recycle()isnotcalledwithitevenbymistake(it’simpossible)andbothCREFandITERbecomesunnecessaryafterstoringitsvaluesatthemomentinFRAME.Thefourframesintheabovetableareimportantbecausethesewillbemodifiedorreferredtomultipletimeslater.Therestthreewillnot.
Then,thistalkmovestohowtoduplicateall.Isaid“how”,butitdoesnotaboutsuchas“bymalloc()”.Theproblemishowto
duplicate“all”.Itisbecause,hereI’dlikeyoutoseetheabovetable,therearesomeframeswithoutanyprevpointer.Inotherwords,wecannotfollowlinks.Inthissituation,howcanweduplicateall?
Afairlyclevertechniqueusedtocounterthis.Let’stakeSCOPEasanexample.Afunctionnamedscope_dup()isusedpreviouslyinordertoduplicateSCOPE,solet’sseeitfirst.
▼scope_dup()onlythebeginning
6187staticvoid6188scope_dup(scope)6189structSCOPE*scope;6190{6191ID*tbl;6192VALUE*vars;61936194scope->flags|=SCOPE_DONT_RECYCLE;
(eval.c)
Asyoucansee,SCOPE_DONT_RECYCLEisset.Thennext,takealookatthedefinitionofPOP_SCOPE():
▼POP_SCOPE()onlythebeginning
869#definePOP_SCOPE()\870if(ruby_scope->flags&SCOPE_DONT_RECYCLE){\871if(_old)scope_dup(_old);\872}\
(eval.c)
Whenitpops,ifSCOPE_DONT_RECYCLEflagwassettothecurrentSCOPE(ruby_scope),italsodoesscope_dup()ofthepreviousSCOPE(_old).Inotherwords,SCOPE_DONT_RECYCLEisalsosettothisone.Inthisway,onebyone,theflagispropagatedatthetimewhenitpops.(Figure7)
Figure7:flagpropagation
SinceVARSalsodoesnothaveanyprevpointer,thesametechniqueisusedtopropagatetheDVAR_DONT_RECYCLEflag.
Next,thesecondpoint,trytothinkabout“whyallofthemareduplicated”.WecanunderstandthatthelocalvariablesofSCOPEcanbereferredtolaterifitsProciscreated.However,isitnecessarytocopyallofthemincludingthepreviousSCOPEinordertoaccomplishthat?
Honestlyspeaking,Icouldn’tfindtheanswerofthisquestionandhasbeenworriedabouthowcanIwritethissectionforalmostthreedays,I’vejustgottheanswer.Takealookatthenextprogram:
defget_procProc.new{nil}
end
env=get_proc{p'ok'}eval("yield",env)
Ihavenotexplainedthisfeature,butbypassingaProcobjectasthesecondargumentofeval,youcanevaluatethestringinthatenvironment.
Itmeans,asthereaderswhohavereaduntilherecanprobablytell,itpushesthevariousenvironmentstakenfromtheProc(meaningBLOCK)andevaluates.Inthiscase,itnaturallyalsopushesBLOCKandyoucanturntheBLOCKintoaProcagain.Then,usingtheProcwhendoingeval…ifthingsaredonelikethis,youcanaccessalmostallinformationofruby_blockfromRubylevelasyoulike.Thisisthereasonwhytheentirestacksneedtobefullyduplicated.((errata:wecannotaccessruby_blockaswelikefromRubylevel.ThereasonwhyallSCOPEsareduplicatedwasnotunderstood.Itseemsallwecandoistoinvestigatethemailinglistarchivesofthetimewhenthischangewasapplied.(Itisstillnotcertainwhetherwecanfindoutthereasoninthisway.)))
InvocationofProcNext,we’lllookattheinvocationofacreatedProc.SinceProc#callcanbeusedfromRubytoinvoke,wecanfollowthesubstanceofit.
ThesubstanceofProc#callisproc_call():
▼proc_call()
6570staticVALUE6571proc_call(proc,args)6572VALUEproc,args;/*OK*/6573{6574returnproc_invoke(proc,args,Qtrue,Qundef);6575}
(eval.c)
Delegatetoproc_invoke().WhenIlookupinvokeinadictionary,itwaswrittensuchas“callon(God,etc.)forhelp”,butwhenitisinthecontextofprogramming,itisoftenusedinthealmostsamemeaningas“activate”.
Theprototypeoftheproc_invoke()is,
proc_invoke(VALUEproc,VALUEargs,intpcall,VALUEself)
However,accordingtothepreviouscode,pcall=Qtrueandself=Qundefinthiscase,sothesetwocanberemovedbyconstantfoldings.
▼proc_invoke(simplified)
staticVALUEproc_invoke(proc,args,/*pcall=Qtrue*/,/*self=Qundef*/)VALUEproc,args;VALUEself;{structBLOCK*volatileold_block;structBLOCK_block;structBLOCK*data;
volatileVALUEresult=Qnil;intstate;volatileintorphan;volatileintsafe=ruby_safe_level;volatileVALUEold_wrapper=ruby_wrapper;structRVarmap*volatileold_dvars=ruby_dyna_vars;
/*(A)takeBLOCKfromprocandassignittodata*/Data_Get_Struct(proc,structBLOCK,data);/*(B)blk_orphan*/orphan=blk_orphan(data);
ruby_wrapper=data->wrapper;ruby_dyna_vars=data->dyna_vars;/*(C)pushBLOCKfromdata*/old_block=ruby_block;_block=*data;ruby_block=&_block;
/*(D)transitiontoITER_CUR*/PUSH_ITER(ITER_CUR);ruby_frame->iter=ITER_CUR;
PUSH_TAG(PROT_NONE);state=EXEC_TAG();if(state==0){proc_set_safe_level(proc);/*(E)invoketheblock*/result=rb_yield_0(args,self,0,pcall);}POP_TAG();
POP_ITER();if(ruby_block->tag->dst==state){state&=TAG_MASK;/*targetspecifiedjump*/}ruby_block=old_block;ruby_wrapper=old_wrapper;ruby_dyna_vars=old_dvars;ruby_safe_level=safe;
switch(state){case0:
break;caseTAG_BREAK:result=prot_tag->retval;break;caseTAG_RETURN:if(orphan){/*orphanprocedure*/localjump_error("returnfromproc-closure",prot_tag->retval);}/*fallthrough*/default:JUMP_TAG(state);}returnresult;}
Thecrucialpointsarethree:C,D,andE.
(C)AtNODE_ITERaBLOCKiscreatedfromthesyntaxtreeandpushed,butthistime,aBLOCKistakenfromProcandpushed.
(D)ItwasITER_PREbeforebecomingITER_CURatrb_call0(),butthistimeitgoesdirectlyintoITER_CUR.
(E)Ifthecasewasanordinaryiterator,itsmethodcallexistsbeforeyeildoccursthengoingtorb_yield_0,butthistimerb_yield_()isdirectlycalledandinvokesthejustpushedblock.
Inotherwords,inthecaseofiterator,theproceduresareseparatedintothreeplaces,NODE_ITER~rb_call0()~NODE_YIELD.Butthistime,theyaredoneallatonce.
Finally,I’lltalkaboutthemeaningofblk_orphan().Asthenamesuggests,itisafunctiontodeterminethestateof“themethod
whichcreatedtheProchasfinished”.Forexample,theSCOPEusedbyaBLOCKhasalreadybeenpopped,youcandetermineithasfinished.
BlockandProcInthepreviouschapter,variousthingsaboutargumentsandparametersofmethodsarediscussed,butIhavenotdescribedaboutblockparametersyet.Althoughitisbrief,hereI’llperformthefinalpartofthatseries.
defm(&block)end
Thisisa“blockparameter”.Thewaytoenablethisisverysimple.Ifmisaniterator,itiscertainthataBLOCKwasalreadypushed,turnitintoaProcandassigninto(inthiscase)thelocalvariableblock.HowtoturnablockintoaProcisjustcallingproc_new(),whichwaspreviouslydescribed.Thereasonwhyjustcallingisenoughcanbealittleincomprehensible.HoweverwhicheverProc.neworm,thesituation“amethodiscalledandaBLOCKispushed”isthesame.Therefore,fromClevel,anytimeyoucanturnablockintoaProcbyjustcallingproc_new().
Andifmisnotaniterator,allwehavetodoissimplyassigningnil.
Next,itisthesidetopassablock.
m(&block)
Thisisa“blockargument”.Thisisalsosimple,takeaBLOCKfrom(aProcobjectstoredin)blockandpushit.WhatdiffersfromPUSH_BLOCK()isonlywhetheraBLOCKhasalreadybeencreatedinadvanceornot.
Thefunctiontodothisprocedureisblock_pass().Ifyouarecuriousabout,checkandconfirmaroundit.However,itreallydoesjustonlywhatwasdescribedhere,it’spossibleyou’llbedisappointed…
TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera
CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License
RubyHackingGuide
Chapter17:Dynamic
evaluation
Overview
Ihavealreadyfinishedtodescribeaboutthemechanismoftheevaluatorbythepreviouschapter.Inthischapter,byincludingtheparserinadditiontoit,let’sexaminethebigpictureas“theevaluatorinabroadsense”.Therearethreetargets:eval,Module#module_evalandObject#instance_eval.
eval
I’vealreadydescribedabouteval,butI’llintroducemoretinythingsaboutithere.
Byusingeval,youcancompileandevaluateastringatruntimeintheplace.Itsreturnvalueisthevalueofthelastexpressionoftheprogram.
peval("1+1")#2
Youcanalsorefertoavariableinitsscopefrominsideofastringtoeval.
lvar=5@ivar=6peval("lvar+@ivar")#11
Readerswhohavebeenreadinguntilherecannotsimplyreadandpassovertheword“itsscope”.Forinstance,youarecuriousabouthowisits“scope”ofconstants,aren’tyou?Iam.Toputthebottomlinefirst,basicallyyoucanthinkitdirectlyinheritstheenvironmentofoutsideofeval.
Andyoucanalsodefinemethodsanddefineclasses.
defaeval('classC;deftest()puts("ok")endend')end
a()#defineclassCandC#testC.new.test#showsok
Moreover,asmentionedalittleinthepreviouschapter,whenyoupassaProcasthesecondargument,thestringcanbeevaluatedinitsenvironment.
defnew_envn=5Proc.new{nil}#turntheenvironmentofthismethodintoanobjectandreturnitend
peval('n*3',new_env())#15
module_evalandinstance_evalWhenaProcispassedasthesecondargumentofeval,theevaluationscanbedoneinitsenvironment.module_evalandinstance_evalisitslimited(orshortcut)version.Withmodule_eval,youcanevaluateinanenvironmentthatisasifinamodulestatementoraclassstatement.
lvar="toplevellvar"#alocalvariabletoconfirmthisscope
moduleMendM.module_eval(<<'EOS')#asuitablesituationtousehere-documentplvar#referablepself#showsMdefok#defineM#okputs'ok'endEOS
Withinstance_eval,youcanevaluateinanenvironmentwhoseselfofthesingletonclassstatementistheobject.
lvar="toplevellvar"#alocalvariabletoconfirmthisscope
obj=Object.newobj.instance_eval(<<'EOS')plvar#referablepself#shows#<Object:0x40274f5c>defok#defineobj.okputs'ok'endEOS
Additionally,thesemodule_evalandinstance_evalcanalsobeused
asiterators,ablockisevaluatedineachenvironmentinthatcase.Forinstance,
obj=Object.newpobj##<Object:0x40274fac>obj.instance_eval{pself##<Object:0x40274fac>}
Likethis.
However,betweenthecasewhenusingastringandthecasewhenusingablock,thebehavioraroundlocalvariablesisdifferenteachother.Forexample,whencreatingablockintheamethodthendoinginstance_evalitinthebmethod,theblockwouldrefertothelocalvariablesofa.Whencreatingastringintheamethodthendoinginstance_evalitinthebmethod,frominsideofthestring,itwouldrefertothelocalvariablesofb.Thescopeoflocalvariablesisdecided“atcompiletime”,theconsequencediffersbecauseastringiscompiledeverytimebutablockiscompiledwhenloadingfiles.
eval
eval()
TheevalofRubybranchesmanytimesbasedonthepresenceandabsenceoftheparameters.Let’sassumetheformofcallislimited
tothebelow:
eval(prog_string,some_block)
Then,sincethismakestheactualinterfacefunctionrb_f_eval()almostmeaningless,we’llstartwiththefunctioneval()whichisonesteplower.Thefunctionprototypeofeval()is:
staticVALUEeval(VALUEself,VALUEsrc,VALUEscope,char*file,intline);
scopeistheProcofthesecondparameter.fileandlineisthefilenameandlinenumberofwhereastringtoevalissupposedtobelocated.Then,let’sseethecontent:
▼eval()(simplified)
4984staticVALUE4985eval(self,src,scope,file,line)4986VALUEself,src,scope;4987char*file;4988intline;4989{4990structBLOCK*data=NULL;4991volatileVALUEresult=Qnil;4992structSCOPE*volatileold_scope;4993structBLOCK*volatileold_block;4994structRVarmap*volatileold_dyna_vars;4995VALUEvolatileold_cref;4996intvolatileold_vmode;4997volatileVALUEold_wrapper;4998structFRAMEframe;4999NODE*nodesave=ruby_current_node;5000volatileintiter=ruby_frame->iter;5001intstate;
50025003if(!NIL_P(scope)){/*alwaystruenow*/5009Data_Get_Struct(scope,structBLOCK,data);5010/*pushBLOCKfromdata*/5011frame=data->frame;5012frame.tmp=ruby_frame;/*topreventfromGC*/5013ruby_frame=&(frame);5014old_scope=ruby_scope;5015ruby_scope=data->scope;5016old_block=ruby_block;5017ruby_block=data->prev;5018old_dyna_vars=ruby_dyna_vars;5019ruby_dyna_vars=data->dyna_vars;5020old_vmode=scope_vmode;5021scope_vmode=data->vmode;5022old_cref=(VALUE)ruby_cref;5023ruby_cref=(NODE*)ruby_frame->cbase;5024old_wrapper=ruby_wrapper;5025ruby_wrapper=data->wrapper;5032self=data->self;5033ruby_frame->iter=data->iter;5034}5045PUSH_CLASS();5046ruby_class=ruby_cbase;/*==ruby_frame->cbase*/50475048ruby_in_eval++;5049if(TYPE(ruby_class)==T_ICLASS){5050ruby_class=RBASIC(ruby_class)->klass;5051}5052PUSH_TAG(PROT_NONE);5053if((state=EXEC_TAG())==0){5054NODE*node;50555056result=ruby_errinfo;5057ruby_errinfo=Qnil;5058node=compile(src,file,line);5059if(ruby_nerrs>0){5060compile_error(0);5061}5062if(!NIL_P(result))ruby_errinfo=result;5063result=eval_node(self,node);5064}5065POP_TAG();
5066POP_CLASS();5067ruby_in_eval--;5068if(!NIL_P(scope)){/*alwaystruenow*/5069intdont_recycle=ruby_scope->flags&SCOPE_DONT_RECYCLE;50705071ruby_wrapper=old_wrapper;5072ruby_cref=(NODE*)old_cref;5073ruby_frame=frame.tmp;5074ruby_scope=old_scope;5075ruby_block=old_block;5076ruby_dyna_vars=old_dyna_vars;5077data->vmode=scope_vmode;/*savethemodificationofthevisibilityscope*/5078scope_vmode=old_vmode;5079if(dont_recycle){/*……copySCOPEBLOCKVARS……*/5097}5098}5104if(state){5105if(state==TAG_RAISE){/*……prepareanexceptionobject……*/5121rb_exc_raise(ruby_errinfo);5122}5123JUMP_TAG(state);5124}51255126returnresult;5127}
(eval.c)
Ifthisfunctionisshownwithoutanypreamble,youprobablyfeel“oww!”.Butwe’vedefeatedmanyfunctionsofeval.cuntilhere,sothisisnotenoughtobeanenemyofus.Thisfunctionisjustcontinuouslysaving/restoringthestacks.Thepointsweneedtocareaboutareonlythebelowthree:
unusuallyFRAMEisalsoreplaced(notcopiedandpushed)ruby_crefissubstituted(?)byruby_frame->cbase
onlyscope_vmodeisnotsimplyrestoredbutinfluencesdata.
Andthemainpartsarethecompile()andeval_node()locatedaroundthemiddle.Thoughit’spossiblethateval_node()hasalreadybeenforgotten,itisthefunctiontostarttheevaluationoftheparameternode.Itwasalsousedinruby_run().
Hereiscompile().
▼compile()
4968staticNODE*4969compile(src,file,line)4970VALUEsrc;4971char*file;4972intline;4973{4974NODE*node;49754976ruby_nerrs=0;4977Check_Type(src,T_STRING);4978node=rb_compile_string(file,src,line);49794980if(ruby_nerrs==0)returnnode;4981return0;4982}
(eval.c)
ruby_nerrsisthevariableincrementedinyyerror().Inotherwords,ifthisvariableisnon-zero,itindicatesmorethanoneparseerrorhappened.And,rb_compile_string()wasalreadydiscussedinPart2.ItwasafunctiontocompileaRubystringintoasyntaxtree.
Onethingbecomesaproblemhereislocalvariable.Aswe’veseeninChapter12:Syntaxtreeconstruction,localvariablesaremanagedbyusinglvtbl.However,sinceaSCOPE(andpossiblyalsoVARS)alreadyexists,weneedtoparseinthewayofwritingoverandaddingtoit.Thisisinfacttheheartofeval(),andistheworstdifficultpart.Let’sgobacktoparse.yagainandcompletethisinvestigation.
top_local
I’vementionedthatthefunctionsnamedlocal_push()local_pop()areusedwhenpushingstructlocal_vars,whichisthemanagementtableoflocalvariables,butactuallythere’sonemorepairoffunctionstopushthemanagementtable.Itisthepairoftop_local_init()andtop_local_setup().Theyarecalledinthissortofway.
▼Howtop_local_init()iscalled
program:{top_local_init();}compstmt{top_local_setup();}
Ofcourse,inactualityvariousotherthingsarealsodone,butallofthemarecutherebecauseit’snotimportant.Andthisisthecontentofit:
▼top_local_init()
5273staticvoid5274top_local_init()5275{5276local_push(1);5277lvtbl->cnt=ruby_scope->local_tbl?ruby_scope->local_tbl[0]:0;5278if(lvtbl->cnt>0){5279lvtbl->tbl=ALLOC_N(ID,lvtbl->cnt+3);5280MEMCPY(lvtbl->tbl,ruby_scope->local_tbl,ID,lvtbl->cnt+1);5281}5282else{5283lvtbl->tbl=0;5284}5285if(ruby_dyna_vars)5286lvtbl->dlev=1;5287else5288lvtbl->dlev=0;5289}
(parse.y)
Thismeansthatlocal_tbliscopiedfromruby_scopetolvtbl.Asforblocklocalvariables,sinceit’sbettertoseethemallatoncelater,we’llfocusonordinarylocalvariablesforthetimebeing.Next,hereistop_local_setup().
▼top_local_setup()
5291staticvoid5292top_local_setup()5293{5294intlen=lvtbl->cnt;/*thenumberoflocalvariablesafterparsing*/5295inti;/*thenumberoflocalvaraiblesbeforeparsing*/52965297if(len>0){5298i=ruby_scope->local_tbl?ruby_scope->local_tbl[0]:0;52995300if(i<len){5301if(i==0||(ruby_scope->flags&SCOPE_MALLOC)==0){
5302VALUE*vars=ALLOC_N(VALUE,len+1);5303if(ruby_scope->local_vars){5304*vars++=ruby_scope->local_vars[-1];5305MEMCPY(vars,ruby_scope->local_vars,VALUE,i);5306rb_mem_clear(vars+i,len-i);5307}5308else{5309*vars++=0;5310rb_mem_clear(vars,len);5311}5312ruby_scope->local_vars=vars;5313ruby_scope->flags|=SCOPE_MALLOC;5314}5315else{5316VALUE*vars=ruby_scope->local_vars-1;5317REALLOC_N(vars,VALUE,len+1);5318ruby_scope->local_vars=vars+1;5319rb_mem_clear(ruby_scope->local_vars+i,len-i);5320}5321if(ruby_scope->local_tbl&&ruby_scope->local_vars[-1]==0){5322free(ruby_scope->local_tbl);5323}5324ruby_scope->local_vars[-1]=0;/*NODEisnotnecessaryanymore*/5325ruby_scope->local_tbl=local_tbl();5326}5327}5328local_pop();5329}
(parse.y)
Sincelocal_varscanbeeitherinthestackorintheheap,itmakesthecodecomplextosomeextent.However,thisisjustupdatinglocal_tblandlocal_varsofruby_scope.(WhenSCOPE_MALLOCwasset,local_varswasallocatedbymalloc()).Andhere,becausethere’snomeaningofusingalloca(),itisforcedtochangeitsallocationmethodtomalloc.
BlockLocalVariableBytheway,howaboutblocklocalvariables?Tothinkaboutthis,wehavetogobacktotheentrypointoftheparserfirst,itisyycompile().
▼settingruby_dyna_varsaside
staticNODE*yycompile(f,line){structRVarmap*vars=ruby_dyna_vars;:n=yyparse();:ruby_dyna_vars=vars;}
Thislookslikeameresave-restore,butthepointisthatthisdoesnotcleartheruby_dyna_vars.ThismeansthatalsointheparseritdirectlyaddselementstothelinkofRVarmapcreatedintheevaluator.
However,accordingtothepreviousdescription,thestructureofruby_dyna_varsdiffersbetweentheparserandtheevalutor.Howdoesitdealwiththedifferenceinthewayofattachingtheheader(RVarmapwhoseid=0)?
Whatishelpfulhereisthe“1”oflocal_push(1)intop_local_init().Whentheargumentoflocal_push()becomestrue,itdoesnotattachthefirstheaderofruby_dyna_vars.Itmeans,itwouldlook
likeFigure1.Now,itisassuredthatwecanrefertotheblocklocalvariablesoftheoutsidescopefrominsideofastringtoeval.
Figure1:ruby_dyna_varsinsideeval
Well,it’ssurewecanreferto,butdidn’tyousaythatruby_dyna_varsisentirelyfreedintheparser?Whatcanwedoifthelinkcreatedattheevaluatorwillbefreed?…I’dlikethereaderswhonoticedthistoberelievedbyreadingthenextpart.
▼yycompile()−freeingruby_dyna_vars
2386vp=ruby_dyna_vars;2387ruby_dyna_vars=vars;2388lex_strterm=0;2389while(vp&&vp!=vars){2390structRVarmap*tmp=vp;2391vp=vp->next;2392rb_gc_force_recycle((VALUE)tmp);2393}
(parse.y)
Itisdesignedsothattheloopwouldstopwhenitreachesthelinkcreatedattheevaluator(vars).
instance_eval
TheWholePictureThesubstanceofModule#module_evalisrb_mod_module_eval(),andthesubstanceofObject#instance_evalisrb_obj_instance_eval().
▼rb_mod_module_eval()rb_obj_instance_eval()
5316VALUE5317rb_mod_module_eval(argc,argv,mod)5318intargc;5319VALUE*argv;5320VALUEmod;5321{5322returnspecific_eval(argc,argv,mod,mod);5323}
5298VALUE5299rb_obj_instance_eval(argc,argv,self)5300intargc;5301VALUE*argv;5302VALUEself;5303{5304VALUEklass;53055306if(rb_special_const_p(self)){5307klass=Qnil;5308}5309else{5310klass=rb_singleton_class(self);5311}53125313returnspecific_eval(argc,argv,klass,self);5314}
(eval.c)
Thesetwomethodshaveacommonpartas“amethodtoreplaceselfwithclass”,thatpartisdefinedasspecific_eval().Figure2showsitandalsowhatwillbedescribed.Whatwithparenthesesarecallsbyfunctionpointers.
Figure2:CallGraph
Whicheverinstance_evalormodule_eval,itcanacceptbothablockandastring,thusitbranchesforeachparticularprocesstoyieldandevalrespectively.However,mostofthemarealsocommonagain,thispartisextractedasexec_under().
Butforthosewhoreading,onehavetosimultaneouslyfaceat2times2=4ways,itisnotagoodplan.Therefore,hereweassumeonlythecasewhen
1. itisaninstance_eval2. whichtakesastringasitsargument
.Andextractingallfunctionsunderrb_obj_instance_eval()in-line,foldingconstants,we’llreadtheresult.
AfterAbsorbedAfterall,itbecomesverycomprehensibleincomparisontotheonebeforebeingabsorbed.
▼specific_eval()−instance_eval,eval,string
staticVALUEinstance_eval_string(self,src,file,line)VALUEself,src;constchar*file;intline;{VALUEsclass;VALUEresult;intstate;intmode;
sclass=rb_singleton_class(self);
PUSH_CLASS();ruby_class=sclass;PUSH_FRAME();ruby_frame->self=ruby_frame->prev->self;ruby_frame->last_func=ruby_frame->prev->last_func;ruby_frame->last_class=ruby_frame->prev->last_class;ruby_frame->argc=ruby_frame->prev->argc;ruby_frame->argv=ruby_frame->prev->argv;if(ruby_frame->cbase!=sclass){ruby_frame->cbase=rb_node_newnode(NODE_CREF,sclass,0,ruby_frame->cbase);}PUSH_CREF(sclass);
mode=scope_vmode;
SCOPE_SET(SCOPE_PUBLIC);PUSH_TAG(PROT_NONE);if((state=EXEC_TAG())==0){result=eval(self,src,Qnil,file,line);}POP_TAG();SCOPE_SET(mode);
POP_CREF();POP_FRAME();POP_CLASS();if(state)JUMP_TAG(state);
returnresult;}
ItseemsthatthispushesthesingletonclassoftheobjecttoCLASSandCREFandruby_frame->cbase.Themainprocessisone-shotofeval().ItisunusualthatthingssuchasinitializingFRAMEbyastruct-copyaremissing,butthisisalsonotcreatesomuchdifference.
BeforebeingabsorbedThoughtheauthorsaiditbecomesmorefriendlytoread,it’spossibleithasbeenalreadysimplesinceitwasnotabsorbed,let’scheckwhereissimplifiedincomparisontothebefore-absorbedone.
Thefirstoneisspecific_eval().SincethisfunctionistosharethecodeoftheinterfacetoRuby,almostallpartsofitistoparsetheparameters.Hereistheresultofcuttingthemall.
▼specific_eval()(simplified)
5258staticVALUE5259specific_eval(argc,argv,klass,self)5260intargc;5261VALUE*argv;5262VALUEklass,self;5263{5264if(rb_block_given_p()){
5268returnyield_under(klass,self);5269}5270else{
5294returneval_under(klass,self,argv[0],file,line);5295}5296}
(eval.c)
Asyoucansee,thisisperfectlybranchesintwowaysbasedonwhetherthere’sablockornot,andeachroutewouldneverinfluencetheother.Therefore,whenreading,weshouldreadonebyone.Tobeginwith,theabsorbedversionisenhancedinthispoint.
Andfileandlineareirrelevantwhenreadingyield_under(),thusinthecasewhentherouteofyieldisabsorbedbythemainbody,itmightbecomeobviousthatwedon’thavetothinkabouttheparseoftheseparametersatall.
Next,we’lllookateval_under()andeval_under_i().
▼eval_under()
5222staticVALUE5223eval_under(under,self,src,file,line)5224VALUEunder,self,src;5225constchar*file;5226intline;5227{5228VALUEargs[4];52295230if(ruby_safe_level>=4){5231StringValue(src);5232}5233else{5234SafeStringValue(src);5235}5236args[0]=self;5237args[1]=src;5238args[2]=(VALUE)file;5239args[3]=(VALUE)line;5240returnexec_under(eval_under_i,under,under,args);5241}
5214staticVALUE5215eval_under_i(args)5216VALUE*args;5217{5218returneval(args[0],args[1],Qnil,(char*)args[2],(int)args[3]);5219}
(eval.c)
Inthisfunction,inordertomakeitsargumentssingle,itstoresthemintotheargsarrayandpassesit.Wecanimaginethatthisargsexistsasatemporarycontainertopassfromeval_under()toeval_under_i(),butnotsurethatitistrulyso.It’spossiblethatargsismodifiedinsideevec_under().
Asawaytoshareacode,thisisaveryrightwaytodo.Butforthosewhoreadit,thiskindofindirectpassingisincomprehensible.
Particularly,becausethereareextracastingsforfileandlinetofoolthecompiler,itishardtoimaginewhatweretheiractualtypes.Thepartsaroundthisentirelydisappearedintheabsorbedversion,soyoudon’thavetoworryaboutgettinglost.
However,it’stoomuchtosaythatabsorbingandextractingalwaysmakesthingseasiertounderstand.Forexample,whencallingexec_under(),underispassedasboththesecondandthirdarguments,butisitallrightiftheexec_under()sideextractsthebothparametervariablesintounder?Thatistosay,thesecondandthirdargumentsofexec_under()are,infact,indicatingCLASSandCREFthatshouldbepushed.CLASSandCREFare“differentthings”,itmightbebettertousedifferentvariables.Alsointhepreviousabsorbedversion,foronlythispoint,
VALUEsclass=.....;VALUEcbase=sclass;
IthoughtthatIwouldwritethisway,butalsothoughtitcouldgivethestrangeimpressionifabruptlyonlythesevariablesareleft,thusitwasextractedassclass.Itmeansthatthisisonlybecauseoftheflowofthetexts.
Bynow,somanytimes,I’veextractedargumentsandfunctions,andforeachtimeIrepeatedlyexplainedthereasontoextract.Theyare
thereareonlyafewpossiblepatternsthebehaviorcanslightlychange
Definitely,I’mnotsaying“Inwhateverwaysextractingvariousthingsalwaysmakesthingssimpler”.
Inwhatevercase,whatofthefirstpriorityisthecomprehensibilityforourselfandnotkeepcomplyingthemethodology.Whenextractingmakesthingssimpler,extractit.Whenwefeelthatnotextractingorconverselybundlingasaproceduremakesthingseasiertounderstand,letusdoit.Asforruby,Ioftenextractedthembecausetheoriginaliswrittenproperly,butifasourcecodewaswrittenbyapoorprogrammer,aggressivelybundlingtofunctionsshouldoftenbecomeagoodchoice.
TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera
CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License
RubyHackingGuide
TranslatedbyVincentISAMBART
Chapter18:Loading
Outline
InterfaceAttheRubylevel,therearetwoproceduresthatcanbeusedforloading:requireandload.
require'uri'#loadtheurilibraryload'/home/foo/.myrc'#readaresourcefile
Theyarebothnormalmethods,compiledandevaluatedexactlylikeanyothercode.Itmeansloadingoccursaftercompilationgavecontroltotheevaluationstage.
Thesetwofunctioneachhavetheirownuse.‘require’istoloadlibraries,andloadistoloadanarbitraryfile.Let’sseethisinmoredetails.
require
requirehasfourfeatures:
thefileissearchedforintheloadpath
itcanloadextensionlibrariesthe.rb/.soextensioncanbeomittedagivenfileisneverloadedmorethanonce
Ruby’sloadpathisintheglobalvariable$:,whichcontainsanarrayofstrings.Forexample,displayingthecontentofthe$:intheenvironmentIusuallyusewouldshow:
%ruby-e'puts$:'/usr/lib/ruby/site_ruby/1.7/usr/lib/ruby/site_ruby/1.7/i686-linux/usr/lib/ruby/site_ruby/usr/lib/ruby/1.7/usr/lib/ruby/1.7/i686-linux.
Callingputsonanarraydisplaysoneelementoneachlinesoit’seasytoread.
AsIranconfigureusing--prefix=/usr,thelibrarypathis/usr/lib/rubyandbelow,butifyoucompileitnormallyfromthesourcecode,thelibrarieswillbein/usr/local/lib/rubyandbelow.InaWindowsenvironment,therewillalsobeadriveletter.
Then,let’strytorequirethestandardlibrarynkf.sofromtheloadpath.
require'nkf'
Iftherequirednamehasnoextension,requiresilentlycompensates.First,ittrieswith.rb,thenwith.so.Onsome
platformsitalsotriestheplatform’sspecificextensionforextensionlibraries,forexample.dllinaWindowsenvironmentor.bundleonMacOSX.
Let’sdoasimulationonmyenvironment.rubychecksthefollowingpathsinsequentialorder.
/usr/lib/ruby/site_ruby/1.7/nkf.rb/usr/lib/ruby/site_ruby/1.7/nkf.so/usr/lib/ruby/site_ruby/1.7/i686-linux/nkf.rb/usr/lib/ruby/site_ruby/1.7/i686-linux/nkf.so/usr/lib/ruby/site_ruby/nkf.rb/usr/lib/ruby/site_ruby/nkf.so/usr/lib/ruby/1.7/nkf.rb/usr/lib/ruby/1.7/nkf.so/usr/lib/ruby/1.7/i686-linux/nkf.rb/usr/lib/ruby/1.7/i686-linux/nkf.sofound!
nkf.sohasbeenfoundin/usr/lib/ruby/1.7/i686-linux.Oncethefilehasbeenfound,require’slastfeature(notloadingthefilemorethanonce)locksthefile.Thelocksarestringsputintheglobalvariable$".Inourcasethestring"nkf.so"hasbeenputthere.Eveniftheextensionhasbeenomittedwhencallingrequire,thefilenamein$"hastheextension.
require'nkf'#afterloadingnkf...p$"#["nkf.so"]thefileislocked
require'nkf'#nothinghappensifwerequireitagainp$"#["nkf.so"]thecontentofthelockarraydoesnotchange
Therearetworeasonsforaddingthemissingextension.Thefirstoneisnottoloadittwiceifthesamefileislaterrequiredwithits
extension.Thesecondoneistobeabletoloadbothnkf.rbandnkf.so.Infacttheextensionsaredisparate(.so.dll.bundleetc.)dependingontheplatform,butatlockingtimetheyallbecome.so.That’swhywhenwritingaRubyprogramyoucanignorethedifferencesofextensionsandconsiderit’salwaysso.SoyoucansaythatrubyisquiteUNIXoriented.
Bytheway,$"canbefreelymodifiedevenattheRubylevelsowecannotsayit’sastronglock.Youcanforexampleloadanextensionlibrarymultipletimesifyouclear$".
load
loadisaloteasierthanrequire.Likerequire,itsearchesthefilein$:.ButitcanonlyloadRubyprograms.Furthermore,theextensioncannotbeomitted:thecompletefilenamemustalwaysbegiven.
load'uri.rb'#loadtheURIlibrarythatispartofthestandardlibrary
Inthissimpleexamplewetrytoloadalibrary,buttheproperwaytouseloadisforexampletoloadaresourcefilegivingitsfullpath.
FlowofthewholeprocessIfweroughlysplitit,“loadingafile”canbesplitin:
findingthefilereadingthefileandmappingittoaninternalform
evaluatingit
Theonlydifferencebetweenrequireandloadishowtofindthefile.Therestisthesameinboth.
Wewilldevelopthelastevaluationpartalittlemore.LoadedRubyprogramsarebasicallyevaluatedatthetop-level.Itmeansthedefinedconstantswillbetop-levelconstantsandthedefinedmethodswillbefunction-stylemethods.
###mylib.rbMY_OBJECT=Object.newdefmy_p(obj)pobjend
###first.rbrequire'mylib'my_pMY_OBJECT#wecanusetheconstantsandmethodsdefinedinanotherfile
Onlythelocalvariablescopeofthetop-levelchangeswhenthefilechanges.Inotherwords,localvariablescannotbesharedbetweendifferentfiles.YoucanofcoursesharethemusingforexampleProcbutthishasnothingtodowiththeloadmechanism.
Somepeoplealsomisunderstandtheloadingmechanism.Whatevertheclassyouareinwhenyoucallload,itdoesnotchangeanything.Evenif,likeinthefollowingexample,youloadafileinthemodulestatement,itdoesnotserveanypurpose,aseverythingthatisatthetop-leveloftheloadedfileisputattheRubytop-level.
require'mylib'#whatevertheplaceyourequirefrom,beitatthetop-levelmoduleSandBoxrequire'mylib'#orinamodule,theresultisthesameend
HighlightsofthischapterWiththeaboveknowledgeinourmind,wearegoingtoread.Butbecausethistimeitsspecificationisdefinedveryparticularly,ifwesimplyreadit,itcouldbejustaenumerationofthecodes.Therefore,inthischapter,wearegoingtoreducethetargettothefollowing3points:
loadingserialisationtherepartitionofthefunctionsinthedifferentsourcefileshowextensionlibrariesareloaded
Regardingthefirstpoint,youwillunderstanditwhenyouseeit.
Forthesecondpoint,thefunctionsthatappearinthischaptercomefrom4differentfiles,eval.cruby.cfile.cdln.c.Whyisthisinthisway?We’lltrytothinkabouttherealisticsituationbehindit.
Thethirdpointisjustlikeitsnamesays.Wewillseehowthecurrentlypopulartrendofexecutiontimeloading,morecommonlyreferredtoasplug-ins,works.Thisisthemostinterestingpartofthischapter,soI’dliketouseasmanypagesaspossibletotalkaboutit.
Searchingthelibrary
rb_f_require()
Thebodyofrequireisrb_f_require.First,wewillonlylookatthepartconcerningthefilesearch.Havingmanydifferentcasesisbothersomesowewilllimitourselvestothecasewhennofileextensionisgiven.
▼rb_f_require()(simplifiedversion)
5527VALUE5528rb_f_require(obj,fname)5529VALUEobj,fname;5530{5531VALUEfeature,tmp;5532char*ext,*ftptr;/*OK*/5533intstate;5534volatileintsafe=ruby_safe_level;55355536SafeStringValue(fname);5537ext=strrchr(RSTRING(fname)->ptr,'.');5538if(ext){/*...ifthefileextensionhasbeengiven...*/5584}5585tmp=fname;5586switch(rb_find_file_ext(&tmp,loadable_ext)){5587case0:5588break;55895590case1:5591feature=fname=tmp;5592gotoload_rb;55935594default:5595feature=tmp;5596fname=rb_find_file(tmp);
5597gotoload_dyna;5598}5599if(rb_feature_p(RSTRING(fname)->ptr,Qfalse))5600returnQfalse;5601rb_raise(rb_eLoadError,"Nosuchfiletoload--%s",RSTRING(fname)->ptr);56025603load_dyna:/*...loadanextensionlibrary...*/5623returnQtrue;56245625load_rb:/*...loadaRubyprogram...*/5648returnQtrue;5649}
5491staticconstchar*constloadable_ext[]={5492".rb",DLEXT,/*DLEXT=".so",".dll",".bundle"...*/5493#ifdefDLEXT25494DLEXT2,/*DLEXT2=".dll"onCygwin,MinGW*/5495#endif549605497};
(eval.c)
Inthisfunctionthegotolabelsload_rbandload_dynaareactuallylikesubroutines,andthetwovariablesfeatureandfnamearemoreorlesstheirparameters.Thesevariableshavethefollowingmeaning.
variable meaning example
feature thelibraryfilenamethatwillbeputin$" uri.rb、nkf.so
fname thefullpathtothelibrary /usr/lib/ruby/1.7/uri.rb
Thenamefeaturecanbefoundinthefunctionrb_feature_p().This
functionchecksifafilehasbeenlocked(wewilllookatitjustafter).
Thefunctionsactuallysearchingforthelibraryarerb_find_file()andrb_find_file_ext().rb_find_file()searchesafileintheloadpath$'.rb_find_file_ext()doesthesamebutthedifferenceisthatittakesasasecondparameteralistofextensions(i.e.loadable_ext)andtriestheminsequentialorder.
Belowwewillfirstlookentirelyatthefilesearchingcode,thenwewilllookatthecodeoftherequirelockinload_rb.
rb_find_file()
Firstthefilesearchcontinuesinrb_find_file().Thisfunctionsearchesthefilepathinthegloballoadpath$'(rb_load_path).Thestringcontaminationcheckistiresomesowe’llonlylookatthemainpart.
▼rb_find_file()(simplifiedversion)
2494VALUE2495rb_find_file(path)2496VALUEpath;2497{2498VALUEtmp;2499char*f=RSTRING(path)->ptr;2500char*lpath;
2530if(rb_load_path){2531longi;2532
2533Check_Type(rb_load_path,T_ARRAY);2534tmp=rb_ary_new();2535for(i=0;i<RARRAY(rb_load_path)->len;i++){2536VALUEstr=RARRAY(rb_load_path)->ptr[i];2537SafeStringValue(str);2538if(RSTRING(str)->len>0){2539rb_ary_push(tmp,str);2540}2541}2542tmp=rb_ary_join(tmp,rb_str_new2(PATH_SEP));2543if(RSTRING(tmp)->len==0){2544lpath=0;2545}2546else{2547lpath=RSTRING(tmp)->ptr;2551}2552}
2560f=dln_find_file(f,lpath);2561if(file_load_ok(f)){2562returnrb_str_new2(f);2563}2564return0;2565}
(file.c)
IfwewritewhathappensinRubywegetthefollowing:
tmp=[]#makeanarray$:.eachdo|path|#repeatoneachelementoftheloadpathtmp.pushpathifpath.length>0#checkthepathandpushitendlpath=tmp.join(PATH_SEP)#concatenateallelementsinonestringseparatedbyPATH_SEP
dln_find_file(f,lpath)#mainprocessing
PATH_SEPisthepathseparator:':'underUNIX,';'underWindows.rb_ary_join()createsastringbyputtingitbetweenthedifferent
elements.Inotherwords,theloadpaththathadbecomeanarrayisbacktoastringwithaseparator.
Why?It’sonlybecausedln_find_file()takesthepathsasastringwithPATH_SEPasaseparator.Butwhyisdln_find_file()implementedlikethat?It’sjustbecausedln.cisnotalibraryforruby.Evenifithasbeenwrittenbythesameauthor,it’sageneralpurposelibrary.That’spreciselyforthisreasonthatwhenIsortedthefilesbycategoryintheIntroductionIputthisfileintheUtilitycategory.GeneralpurposelibrariescannotreceiveRubyobjectsasparametersorreadrubyglobalvariables.
dln_find_file()alsoexpandsforexample~tothehomedirectory,butinfactthisisalreadydoneintheomittedpartofrb_find_file().Soinruby‘scaseit’snotnecessary.
LoadingwaitHere,filesearchisfinishedquickly.Thencomesistheloadingcode.Ormoreaccurately,itis“uptojustbeforetheload”.Thecodeofrb_f_require()’sload_rbhasbeenputbelow.
▼rb_f_require():load_rb
5625load_rb:5626if(rb_feature_p(RSTRING(feature)->ptr,Qtrue))5627returnQfalse;5628ruby_safe_level=0;5629rb_provide_feature(feature);5630/*theloadingofRubyprogramsisserialised*/
5631if(!loading_tbl){5632loading_tbl=st_init_strtable();5633}5634/*partialstate*/5635ftptr=ruby_strdup(RSTRING(feature)->ptr);5636st_insert(loading_tbl,ftptr,curr_thread);/*...loadtheRubyprogramandevaluateit...*/5643st_delete(loading_tbl,&ftptr,0);/*loadingdone*/5644free(ftptr);5645ruby_safe_level=safe;
(eval.c)
Likementionedabove,rb_feature_p()checksifalockhasbeenputin$".Andrb_provide_feature()pushesastringin$",inotherwordslocksthefile.
Theproblemcomesafter.Likethecommentsays“theloadingofRubyprogramsisserialised”.Inotherwords,afilecanonlybeloadedfromonethread,andifduringtheloadinganotherthreadtriestoloadthesamefile,thatthreadwillwaitforthefirstloadingtobefinished.Ifitwerenotthecase:
Thread.fork{require'foo'#Atthebeginningofrequire,foo.rbisaddedto$"}#Howeverthethreadchangesduringtheevaluationoffoo.rbrequire'foo'#foo.rbisalreadyin$"sothefunctionreturnsimmediately#(A)theclassesoffooareused...
Bydoingsomethinglikethis,eventhoughthefoolibraryisnotreallyloaded,thecodeat(A)endsupbeingexecuted.
Theprocesstoenterthewaitingstateissimple.Ast_tableiscreatedinloading_tbl,theassociation“feature=>waitingthread”is
recordedinit.curr_threadisineval.c’sfunctions,itsvalueisthecurrentrunningthread.
Themechanismtoenterthewaitingstateisverysimple.Ast_tableiscreatedintheloading_tblglobalvariable,anda“feature=>loadingthread”associationiscreated.curr_threadisavariablefromeval.c,anditsvalueisthecurrentlyrunningthread.Thatmakesanexclusivelock.Andinrb_feature_p(),wewaitfortheloadingthreadtoendlikethefollowing.
▼rb_feature_p()(secondhalf)
5477rb_thread_tth;54785479while(st_lookup(loading_tbl,f,&th)){5480if(th==curr_thread){5481returnQtrue;5482}5483CHECK_INTS;5484rb_thread_schedule();5485}
(eval.c)
Whenrb_thread_schedule()iscalled,thecontrolistransferredtoanotherthread,andthisfunctiononlyreturnsafterthecontrolreturnedbacktothethreadwhereitwascalled.Whenthefilenamedisappearsfromloading_tbl,theloadingisfinishedsothefunctioncanend.Thecurr_threadcheckisnottolockitself(figure1).
Figure1:Serialisationofloads
LoadingofRubyprograms
rb_load()
Wewillnowlookattheloadingprocessitself.Let’sstartbythepartinsiderb_f_require()’sload_rbloadingRubyprograms.
▼rb_f_require()-load_rb-loading
5638PUSH_TAG(PROT_NONE);5639if((state=EXEC_TAG())==0){5640rb_load(fname,0);5641}5642POP_TAG();
(eval.c)
Therb_load()whichiscalledhereisactuallythe“meat”oftheRuby-levelload.Thismeansitneedstosearchonceagain,butlookingatthesameprocedureonceagainistoomuchtrouble.Therefore,thatpartisomittedinthebelowcodes.
Andthesecondargumentwrapisfoldedwith0becauseitis0intheabovecallingcode.
▼rb_load()(simplifiededition)
voidrb_load(fname,/*wrap=0*/)VALUEfname;{intstate;volatileIDlast_func;
volatileVALUEwrapper=0;volatileVALUEself=ruby_top_self;NODE*saved_cref=ruby_cref;
PUSH_VARS();PUSH_CLASS();ruby_class=rb_cObject;ruby_cref=top_cref;/*(A-1)changeCREF*/wrapper=ruby_wrapper;ruby_wrapper=0;PUSH_FRAME();ruby_frame->last_func=0;ruby_frame->last_class=0;ruby_frame->self=self;/*(A-2)changeruby_frame->cbase*/ruby_frame->cbase=(VALUE)rb_node_newnode(NODE_CREF,ruby_class,0,0);PUSH_SCOPE();/*atthetop-levelthevisibilityisprivatebydefault*/SCOPE_SET(SCOPE_PRIVATE);PUSH_TAG(PROT_NONE);ruby_errinfo=Qnil;/*makesureit'snil*/state=EXEC_TAG();last_func=ruby_frame->last_func;if(state==0){NODE*node;
/*(B)thisisdealtwithasevalforsomereasons*/ruby_in_eval++;rb_load_file(RSTRING(fname)->ptr);ruby_in_eval--;node=ruby_eval_tree;if(ruby_nerrs==0){/*noparseerroroccurred*/eval_node(self,node);}}ruby_frame->last_func=last_func;POP_TAG();ruby_cref=saved_cref;POP_SCOPE();POP_FRAME();POP_CLASS();POP_VARS();ruby_wrapper=wrapper;if(ruby_nerrs>0){/*aparseerroroccurred*/
ruby_nerrs=0;rb_exc_raise(ruby_errinfo);}if(state)jump_tag_but_local_jump(state);if(!NIL_P(ruby_errinfo))/*anexceptionwasraisedduringtheloading*/rb_exc_raise(ruby_errinfo);}
Justafterwethoughtwe’vebeenthroughthestormofstackmanipulationsweenteredagain.Althoughthisistough,let’scheerupandreadit.
Asthelongfunctionsusuallyare,almostallofthecodeareoccupiedbytheidioms.PUSH/POP,tagprotectingandre-jumping.Amongthem,whatwewanttofocusonisthethingson(A)whichrelatetoCREF.Sincealoadedprogramisalwaysexecutedonthetop-level,itsetsaside(notpush)ruby_crefandbringsbacktop_cref.ruby_frame->cbasealsobecomesanewone.
Andonemoreplace,at(B)somehowruby_in_evalisturnedon.Whatisthepartinfluencedbythisvariable?Iinvestigateditanditturnedoutthatitseemsonlyrb_compile_error().Whenruby_in_evalistrue,themessageisstoredintheexceptionobject,butwhenitisnottrue,themessageisprintedtostderr.Inotherwords,whenitisaparseerrorofthemainprogramofthecommand,itwantstoprintdirectlytostderr,butwheninsideoftheevaluator,itisnotappropriatesoitstopstodoit.Itseemsthe“eval”ofruby_in_evalmeansneithertheevalmethodnortheeval()functionbut“evaluate”asageneralnoun.Or,it’spossibleitindicateseval.c.
rb_load_file()
Then,allofasudden,thesourcefileisruby.chere.Ortoputitmoreaccurately,essentiallyitisfavorableiftheentireloadingcodewasputinruby.c,butrb_load()hasnochoicebuttousePUSH_TAGandsuch.Therefore,puttingitineval.cisinevitable.Ifitwerenotthecase,allofthemwouldbeputineval.cinthefirstplace.
Then,itisrb_load_file().
▼rb_load_file()
865void866rb_load_file(fname)867char*fname;868{869load_file(fname,0);870}
(ruby.c)
Delegatedentirely.Thesecondargumentscriptofload_file()isabooleanvalueanditindicateswhetheritisloadingthefileoftheargumentoftherubycommand.Now,becausewe’dliketoassumeweareloadingalibrary,let’sfolditbyreplacingitwithscript=0.Furthermore,inthebelowcode,alsothinkingaboutthemeanings,nonessentialthingshavealreadybeenremoved.
▼load_file()(simplifiededition)
staticvoid
load_file(fname,/*script=0*/)char*fname;{VALUEf;{FILE*fp=fopen(fname,"r");(A)if(fp==NULL){rb_load_fail(fname);}fclose(fp);}f=rb_file_open(fname,"r");(B)rb_compile_file(fname,f,1);(C)rb_io_close(f);}
(A)Thecalltofopen()istocheckifthefilecanbeopened.Ifthereisnoproblem,it’simmediatelyclosed.Itmayseemalittleuselessbutit’sanextremelysimpleandyethighlyportableandreliablewaytodoit.
(B)Thefileisopenedonceagain,thistimeusingtheRubylevellibraryFile.open.ThefilewasnotopenedwithFile.openfromthebeginningsoasnottoraiseanyRubyexception.Hereifanyexceptionoccurredwewouldliketohavealoadingerror,butgettingtheerrorsrelatedtoopen,forexampleErrno::ENOENT,Errno::EACCESS…,wouldbeproblematic.Weareinruby.csowecannotstopatagjump.
(C)Usingtheparserinterfacerb_compile_file(),theprogramisreadfromanIOobject,andcompiledinasyntaxtree.Thesyntaxtreeisaddedtoruby_eval_treesothereisnoneedtogettheresult.
That’sallfortheloadingcode.Finally,thecallswerequitedeepsothecallgraphofrb_f_require()isshownbellow.
rb_f_require....eval.crb_find_file....file.cdln_find_file....dln.cdln_find_file_1rb_loadrb_load_file....ruby.cload_filerb_compile_file....parse.yeval_node
Youmustbringcallgraphsonalongtrip.It’scommonknowledge.
ThenumberofopenrequiredforloadingPreviously,therewasopenusedjusttocheckifafilecanbeopen,butinfact,duringtheloadingprocessofruby,additionallyotherfunctionssuchasrb_find_file_ext()alsointernallydochecksbyusingopen.Howmanytimesisopen()calledinthewholeprocess?
Ifyou’rewonderingthat,justactuallycountingitistherightattitudeasaprogrammer.Wecaneasilycountitbyusingasystemcalltracer.ThetooltousewouldbestraceonLinux,trussonSolaris,ktraceortrussonBSD.Likethis,foreachOS,thenameisdifferentandthere’snoconsistency,butyoucanfindthembygoogling.
Ifyou’reusingWindows,probablyyourIDEwillhaveatracerbuiltin.Well,asmymainenvironmentisLinux,Ilookedusingstrace.
Theoutputisdoneonstderrsoitwasredirectedusing2>&1.
%straceruby-e'require"rational"'2>&1|grep'^open'open("/etc/ld.so.preload",O_RDONLY)=-1ENOENTopen("/etc/ld.so.cache",O_RDONLY)=3open("/usr/lib/libruby-1.7.so.1.7",O_RDONLY)=3open("/lib/libdl.so.2",O_RDONLY)=3open("/lib/libcrypt.so.1",O_RDONLY)=3open("/lib/libc.so.6",O_RDONLY)=3open("/usr/lib/ruby/1.7/rational.rb",O_RDONLY|O_LARGEFILE)=3open("/usr/lib/ruby/1.7/rational.rb",O_RDONLY|O_LARGEFILE)=3open("/usr/lib/ruby/1.7/rational.rb",O_RDONLY|O_LARGEFILE)=3open("/usr/lib/ruby/1.7/rational.rb",O_RDONLY|O_LARGEFILE)=3
Untiltheopenoflibc.so.6,itistheopenusedintheimplementationofdynamiclinks,andtherearetheotherfouropens.Thusitseemsthethreeofthemareuseless.
Loadingofextensionlibraries
rb_f_require()-load_dynaThistimewewillseetheloadingofextensionlibraries.Wewillstartwithrb_f_require()’sload_dyna.However,wedonotneedthepartaboutlockinganymoresoitwasremoved.
▼rb_f_require()-load_dyna
5607{5608intvolatileold_vmode=scope_vmode;
56095610PUSH_TAG(PROT_NONE);5611if((state=EXEC_TAG())==0){5612void*handle;56135614SCOPE_SET(SCOPE_PUBLIC);5615handle=dln_load(RSTRING(fname)->ptr);5616rb_ary_push(ruby_dln_librefs,LONG2NUM((long)handle));5617}5618POP_TAG();5619SCOPE_SET(old_vmode);5620}5621if(state)JUMP_TAG(state);
(eval.c)
Bynow,thereisverylittleherewhichisnovel.Thetagsareusedonlyinthewayoftheidiom,andtosave/restorethevisibilityscopeisdoneinthewaywegetusedtosee.Allthatremainsisdln_load().Whatonearthisthatfor?Fortheanswer,continuetothenextsection.
Brushupaboutlinksdln_load()isloadinganextensionlibrary,butwhatdoesloadinganextensionlibrarymean?Totalkaboutit,weneedtodramaticallyrollbackthetalktothephysicalworld,andstartwithaboutlinks.
IthinkcompilingCprogramsis,ofcourse,notanewthingforyou.SinceI’musinggcconLinux,Icancreatearunnableprograminthefollowingmanner.
%gcchello.c
Accordingtothefilename,thisisprobablyan“Hello,World!”program.InUNIX,gccoutputsaprogramintoafilenameda.outbydefault,soyoucansubsequentlyexecuteitinthefollowingway:
%./a.outHello,World!
Itiscreatedproperly.
Bytheway,whatisgccactuallydoinghere?Usuallywejustsay“compile”or“compile”,butactually
1. preprocess(cpp)2. compileCintoassembly(cc)3. assembletheassemblylanguageintomachinecode(as)4. link(ld)
therearethesefoursteps.Amongthem,preprocessingandcompilingandassemblingaredescribedinalotofplaces,butthedescriptionoftenendswithoutclearlydescribingaboutthelinkingphase.Itislikeahistoryclassinschoolwhichwouldneverreach“modernage”.Therefore,inthisbook,tryingtoprovidetheextinguishedpart,I’llbrieflysummarizewhatislinking.
Aprogramfinishedtheassemblingphasebecomesan“objectfile”insomewhatformat.Thefollowingformatsaresomeofsuchformatswhicharemajor.
ELF,ExecutableandLinkingFormat(recentUNIX)
a.out,assembleroutput(relativelyoldUNIX)COFF,CommonObjectFileFormat(Win32)
Itmightgowithoutsayingthatthea.outasanobjectfileformatandthea.outasadefaultoutputfilenameofccaretotallydifferentthings.Forexample,onmodernLinux,whenwecreateitordinarily,thea.outfileinELFformatiscreated.
And,howtheseobjectfileformatsdiffereachotherisnotimportantnow.Whatwehavetorecognizenowis,alloftheseobjectfilescanbeconsideredas“asetofnames”.Forexample,thefunctionnamesandthevariablenameswhichexistinthisfile.
And,setsofnamescontainedintheobjectfilehavetwotypes.
setofnecessarynames(forinstance,theexternalfunctionscalledinternally.e.g.printf)
setofprovidingnames(forinstance,thefunctionsdefinedinternally.e.g.hello)
Andlinkingis,whengatheringmultipleobjectfiles,checkingif“thesetofprovidingnames”contains“thesetofnecessarynames”entirely,andconnectingthemeachother.Inotherwords,pullingthelinesfromallof“thenecessarynames”,eachlinemustbeconnectedtooneof“theprovidingnames”ofaparticularobjectfile.(Figure.2)Toputthisintechnicalterms,itisresolvingundefinedsymbols.
Figure2:objectfilesandlinking
Logicallythisishowitis,butinrealityaprogramcan’trunonlybecauseofthis.Atleast,Cprogramscannotrunwithoutconvertingthenamestotheaddresses(numbers).
So,afterthelogicalconjunctions,thephysicalconjunctionsbecomenecessary.Wehavetomapobjectfilesintotherealmemoryspaceandsubstitutetheallnameswithnumbers.Concretelyspeaking,forinstance,theaddressestojumptoonfunctioncallsareadjustedhere.
And,basedonthetimingwhentodothesetwoconjunctions,linkingisdividedintotwotypes:staticlinkinganddynamiclinking.Staticlinkingfinishestheallphasesduringthecompiletime.Ontheotherhand,dynamiclinkingdeferssomeoftheconjunctionstotheexecutingtime.Andlinkingisfinallycompletedwhenexecuting.
However,whatexplainedhereisaverysimpleidealisticmodel,andithasanaspectdistortingtherealityalot.Logicalconjunctionsandphysicalconjunctionsarenotsocompletelyseparated,and“anobjectfileisasetofnames”istoonaive.Butthebehavioraroundthisconsiderablydiffersdependingoneachplatform,describingseriouslywouldendupwithonemorebook.Toobtaintherealisticlevelknowledge,additionally,“ExpertCProgramming:DeepCSecrets”byPetervanderLinden,“LinkersandLoaders”byJohnR.LevineIrecommendtoreadthesebooks.
LinkingthatistrulydynamicAndfinallywegetintoourmaintopic.The“dynamic”in“dynamiclinking”naturallymeansit“occursatexecutiontime”,butwhatpeopleusuallyrefertoas“dynamiclinking”isprettymuchdecidedalreadyatcompiletime.Forexample,thenamesoftheneededfunctions,andwhichlibrarytheycanbefoundin,arealreadyknown.Forinstance,ifyouneedcos(),youknowit’sinlibm,soyouusegcc-lm.Ifyoudidn’tspecifythecorrectlibraryatcompiletime,you’dgetalinkerror.
Butextensionlibrariesaredifferent.Neitherthenamesoftheneededfunctions,orthenameofthelibrarywhichdefinesthemareknownatcompiletime.Weneedtoconstructastringatexecutiontimeandloadandlink.Itmeansthateven“thelogicalconjunctions”inthesenseofthepreviouswordsshouldbedoneentirelyatexecutiontime.Inordertodoit,anothermechanismthatisalittledifferentformtheordinaldynamiclinkingsis
required.
Thismanipulation,linkingthatisentirelydecidedatruntime,isusuallycalled“dynamicload”.
DynamicloadAPII’vefinishedtoexplaintheconcept.Therestishowtodothatdynamicloading.Thisisnotadifficultthing.Usuallythere’saspecificAPIpreparedinthesystem,wecanaccomplishitbymerelycallingit.
Forexample,whatisrelativelybroadforUNIXistheAPInameddlopen.However,Ican’tsay“ItisalwaysavailableonUNIX”.Forexample,foralittlepreviousHP-UXhasatotallydifferentinterface,andaNeXT-flavorAPIisusedonMacOSX.Andevenifitisthesamedlopen,itisincludedinlibconBSD-derivedOS,anditisattachedfromoutsideaslibdlonLinux.Therefore,itisdesperatelynotportable.ItdiffersevenamongUNIX-basedplatforms,itisobvioustobecompletelydifferentintheotherOperatingSystems.ItisunlikelythatthesameAPIisused.
Then,howrubyisdoingis,inordertoabsorbthetotallydifferentinterfaces,thefilenameddln.cisprepared.dlnisprobablytheabbreviationof“dynamiclink”.dln_load()isoneoffunctionsofdln.c.
WheredynamicloadingAPIsaretotallydifferenteachother,the
onlysavingistheusagepatternofAPIiscompletelythesame.Whicheverplatformyouareon,
1. mapthelibrarytotheaddressspaceoftheprocess2. takethepointerstothefunctionscontainedinthelibrary3. unmapthelibrary
itconsistsofthesetheresteps.Forexample,ifitisdlopen-basedAPI,
1. dlopen2. dlsym3. dlclose
arethecorrespondences.IfitisWin32API,
1. LoadLibrary(orLoadLibraryEx)2. GetProcAddress3. FreeLibrary
arethecorrespondences.
Atlast,I’lltalkaboutwhatdln_load()isdoingbyusingtheseAPIs.Itis,infact,callingInit_xxxx().Byreachinghere,wefinallybecometobeabletoillustratetheentireprocessofrubyfromtheinvocationtothecompletionwithoutanylacks.Inotherwords,whenrubyisinvoked,itinitializestheevaluatorandstartsevaluatingaprogrampassedinsomewhatway.Ifrequireorloadoccursduringtheprocess,itloadsthelibraryandtransfersthe
control.TransferringthecontrolmeansparsingandevaluatingifitisaRubylibraryanditmeansloadingandlinkingandfinallycallingInit_xxxx()ifitisanextensionlibrary.
dln_load()
Finally,we’vereachedthecontentofdln_load().dln_load()isalsoalongfunction,butitsstructureissimplebecauseofsomereasons.Takealookattheoutlinefirst.
▼dln_load()(outline)
void*dln_load(file)constchar*file;{#ifdefined_WIN32&&!defined__CYGWIN__loadwithWin32API#elseinitializationdependingoneachplatform#ifdefeachplatform……routinesforeachplatform……#endif#endif#if!defined(_AIX)&&!defined(NeXT)failed:rb_loaderror("%s-%s",error,file);#endifreturn0;/*dummyreturn*/}
Thisway,thepartconnectingtothemainiscompletelyseparatedbasedoneachplatform.Whenthinking,weonlyhavetothinkaboutoneplatformatatime.SupportedAPIsareasfollows:
dlopen(MostofUNIX)LoadLibrary(Win32)shl_load(abitoldHP-UX)a.out(veryoldUNIX)rld_load(beforeNeXT4)dyld(NeXTorMacOSX)get_image_symbol(BeOS)GetDiskFragment(MacOs9andbefore)load(abitoldAIX)
dln_load()-dlopen()First,let’sstartwiththeAPIcodeforthedlopenseries.
▼dln_load()-dlopen()
1254void*1255dln_load(file)1256constchar*file;1257{1259constchar*error=0;1260#defineDLN_ERROR()(error=dln_strerror(),\strcpy(ALLOCA_N(char,strlen(error)+1),error))1298char*buf;1299/*writeastring"Init_xxxx"tobuf(thespaceisallocatedwithalloca)*/1300init_funcname(&buf,file);
1304{1305void*handle;1306void(*init_fct)();13071308#ifndefRTLD_LAZY1309#defineRTLD_LAZY1
1310#endif1311#ifndefRTLD_GLOBAL1312#defineRTLD_GLOBAL01313#endif13141315/*(A)loadthelibrary*/1316if((handle=(void*)dlopen(file,RTLD_LAZY|RTLD_GLOBAL))==NULL){1317error=dln_strerror();1318gotofailed;1319}1320/*(B)getthepointertoInit_xxxx()*/1321init_fct=(void(*)())dlsym(handle,buf);1322if(init_fct==NULL){1323error=DLN_ERROR();1324dlclose(handle);1325gotofailed;1326}1327/*(C)callInit_xxxx()*/1328(*init_fct)();13291330returnhandle;1331}
1576failed:1577rb_loaderror("%s-%s",error,file);1580}
(dln.c)
(A)theRTLD_LAZYastheargumentofdlopen()indicates“resolvingtheundefinedsymbolswhenthefunctionsareactuallydemanded”Thereturnvalueisthemark(handle)todistinguishthelibraryandwealwaysneedtopassitwhenusingdl*().
(B)dlsym()getsthefunctionpointerfromthelibraryspecifiedbythehandle.IfthereturnvalueisNULL,itmeansfailure.Here,
gettingthepointertoInit_xxxx()IfthereturnvalueisNULL,itmeansfailure.Here,thepointertoInit_xxxx()isobtainedandcalled.
dlclose()isnotcalledhere.SincethepointerstothefunctionsoftheloadedlibraryarepossiblyreturnedinsideInit_xxx(),itistroublesomeifdlclose()isdonebecausetheentirelibrarywouldbedisabledtouse.Thus,wecan’tcalldlclose()untiltheprocesswillbefinished.
dln_load()—Win32AsforWin32,LoadLibrary()andGetProcAddress()areused.ItisverygeneralWin32APIwhichalsoappearsonMSDN.
▼dln_load()-Win32
1254void*1255dln_load(file)1256constchar*file;1257{
1264HINSTANCEhandle;1265charwinfile[MAXPATHLEN];1266void(*init_fct)();1267char*buf;12681269if(strlen(file)>=MAXPATHLEN)rb_loaderror("filenametoolong");12701271/*writethe"Init_xxxx"stringtobuf(thespaceisallocatedwithalloca)*/1272init_funcname(&buf,file);12731274strcpy(winfile,file);1275
1276/*loadthelibrary*/1277if((handle=LoadLibrary(winfile))==NULL){1278error=dln_strerror();1279gotofailed;1280}12811282if((init_fct=(void(*)())GetProcAddress(handle,buf))==NULL){1283rb_loaderror("%s-%s\n%s",dln_strerror(),buf,file);1284}12851286/*callInit_xxxx()*/1287(*init_fct)();1288returnhandle;
1576failed:1577rb_loaderror("%s-%s",error,file);1580}
(dln.c)
DoingLoadLibrary()thenGetProcAddress().Thepatternissoequivalentthatnothingislefttosay,Idecidedtoendthischapter.
TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera
CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License
RubyHackingGuide
Chapter19:Threads
Outline
RubyInterfaceCometothinkofit,IfeelIhavenotintroducedanactualcodetouseRubythreads.Thisisnotsospecial,buthereI’llintroduceitjustincase.
Thread.fork{whiletrueputs'forkedthread'end}whiletrueputs'mainthread'end
Whenexecutingthisprogram,alotof"forkedthread"and"mainthread"areprintedintheproperlymixedstate.
Ofcourse,otherthanjustcreatingmultiplethreads,therearealsovariouswaystocontrol.There’snotthesynchronizeasareservedwordlikeJava,commonprimitivessuchasMutexorQueueorMonitorareofcourseavailable,andthebelowAPIscanbeusedtocontrola
threaditself.
▼ThreadAPI
Thread.pass transfertheexecutiontoanyotherthreadThread.kill(th) terminatestheththreadThread.exit terminatesthethreaditselfThread.stop temporarilystopthethreaditselfThread#join waitingforthethreadtofinishThread#wakeup towakeupthetemporarilystoppedthread
rubyThreadThreadsaresupposedto“runalltogether”,butactuallytheyarerunningforalittletimeinturns.Tobeprecise,bymakingsomeeffortsonamachineofmultiCPU,it’spossiblethat,forinstance,twoofthemarerunningatthesametime.Butstill,iftherearemorethreadsthanthenumberofCPU,theyhavetoruninturns.
Inotherwords,inordertocreatethreads,someonehastoswitchthethreadsinsomewhere.Thereareroughlytwowaystodoit:kernel-levelthreadsanduser-levelthreads.Theyarerespectively,asthenamessuggest,tocreateathreadinkerneloratuser-level.Ifitiskernel-level,bymakinguseofmulti-CPU,multiplethreadscanrunatthesametime.
Then,howaboutthethreadofruby?Itisuser-levelthread.And(Therefore),thenumberofthreadsthatarerunnableatthesametimeislimitedtoone.
Isitpreemptive?I’lldescribeaboutthetraitsofrubythreadsinmoredetail.Asanalternativepointofviewofthreads,there’sthepointthatis“isitpreemptive?”.
Whenwesay“thread(system)ispreemptive”,thethreadswillautomaticallybeswitchedwithoutbeingexplicitlyswitchedbyitsuser.Lookingthisfromtheoppositedirection,theusercan’tcontrolthetimingofswitchingthreads.
Ontheotherhand,inanon-preemptivethreadsystem,untiltheuserwillexplicitlysay“Icanpassthecontrolrighttothenextthread”,threadswillneverbeswitched.Lookingthisfromtheoppositedirection,whenandwherethere’sthepossibilityofswitchingthreadsisobvious.
Thisdistinctionisalsoforprocesses,inthatcase,preemptiveisconsideredas“superior”.Forexample,ifaprogramhadabuganditenteredaninfiniteloop,theprocesseswouldneverbeabletoswitch.Thismeansauserprogramcanhaltthewholesystemandisnotgood.And,switchingprocesseswasnon-preemptiveonWindows3.1becauseitsbasewasMS-DOS,butWindows95ispreemptive.Thus,thesystemismorerobust.Hence,itissaidthatWindows95is“superior”to3.1.
Then,howabouttherubythread?ItispreemptiveatRuby-level,andnon-preemptiveatClevel.Inotherwords,whenyouarewritingCcode,youcandeterminealmostcertainlythetimingsof
switchingthreads.
Whyisthisdesignedinthisway?Threadsareindeedconvenient,butitsuseralsoneedtopreparecertainminds.Itmeansthatitisnecessarythecodeiscompatibletothethreads.(Itmustbemulti-threadsafe).Inotherwords,inordertomakeitpreemptivealsoinClevel,theallClibrarieshavetobethreadsafe.
Butinreality,therearealsoalotofClibrariesthatarestillnotthreadsafe.Alotofeffortsweremadetoeasetowriteextensionlibraries,butitwouldbebrownifthenumberofusablelibrariesisdecreasedbyrequiringthreadsafety.Therefore,non-preemptiveatClevelisareasonablechoiceforruby.
ManagementSystemWe’veunderstandrubythreadisnon-preemptiveatClevel.Itmeansafteritrunsforawhile,itvoluntarilyletgoofthecontrollingright.Then,I’dlikeyoutosupposethatnowacurrentlybeingexecutedthreadisabouttoquittheexecution.Whowillnextreceivethecontrolright?Butbeforethat,it’simpossibletoguessitwithoutknowinghowthreadsareexpressedinsiderubyinthefirstplace.Let’slookatthevariablesandthedatatypestomanagethreads.
▼thestructuretomanagethreads
864typedefstructthread*rb_thread_t;865staticrb_thread_tcurr_thread=0;
866staticrb_thread_tmain_thread;
7301structthread{7302structthread*next,*prev;
(eval.c)
Sincestructthreadisveryhugeforsomereason,thistimeInarroweditdowntotheonlyimportantpart.Itiswhythereareonlythetwo.Thesenextandprevaremembernames,andtheirtypesarerb_thread_t,thuswecanexpectrb_thread_tisconnectedbyadual-directionallinklist.Andactuallyitisnotanordinarydual-directionallist,thebothendsareconnected.Itmeans,itiscircular.Thisisabigpoint.Addingthestaticmain_threadandcurr_threadvariablestoit,thewholedatastructurewouldlooklikeFigure1.
Figure1:thedatastructurestomanagethreads
main_thread(mainthread)meansthethreadexistedatthetime
whenaprogramstarted,meaningthe“first”thread.curr_threadisobviouslycurrentthread,meaningthethreadcurrentlyrunning.Thevalueofmain_threadwillneverchangewhiletheprocessisrunning,butthevalueofcurr_threadwillchangefrequently.
Inthisway,becausethelistisbeingacircle,theproceduretochose“thenextthread”becomeseasy.Itcanbedonebymerelyfollowingthenextlink.Onlybythis,wecanrunallthreadsequallytosomeextent.
Whatdoesswitchingthreadsmean?Bytheway,whatisathreadinthefirstplace?Or,whatmakesustosaythreadsareswitched?
Theseareverydifficultquestions.Similartowhataprogramisorwhatanobjectis,whenaskedaboutwhatareusuallyunderstoodbyfeelings,it’shardtoanswerclearly.Especially,“whatisthedifferencebetweenthreadsandprocesses?”isagoodquestion.
Still,inarealisticrange,wecandescribeittosomeextent.Whatnecessaryforthreadsisthecontextofexecuting.Asforthecontextofruby,aswe’veseenbynow,itconsistsofruby_frameandruby_scopeandruby_classandsoon.Andrubyallocatesthesubstanceofruby_frameonthemachinestack,andtherearealsothestackspaceusedbyextensionlibraries,thereforethemachinestackisalsonecessaryasacontextofaRubyprogram.Andfinally,theCPUregistersareindispensable.Thesevariouscontextsarethe
elementstoenablethreads,andswitchingthemmeansswitchingthreads.Or,itiscalled“context-switch”.
Thewayofcontext-switchingTheresttalkishowtoswitchcontexts.ruby_scopeandruby_classareeasytoreplace:allocatespacesforthemsomewheresuchastheheapandsetthemasideonebyone.FortheCPUregisters,wecanmakeitbecausewecansaveandwritebackthembyusingsetjmp().Thespacesforbothpurposesarerespectivelypreparedinrb_thread_t.
▼structthread(partial)
7301structthread{7302structthread*next,*prev;7303jmp_bufcontext;
7315structFRAME*frame;/*ruby_frame*/7316structSCOPE*scope;/*ruby_scope*/7317structRVarmap*dyna_vars;/*ruby_dyna_vars*/7318structBLOCK*block;/*ruby_block*/7319structiter*iter;/*ruby_iter*/7320structtag*tag;/*prot_tag*/7321VALUEklass;/*ruby_class*/7322VALUEwrapper;/*ruby_wrapper*/7323NODE*cref;/*ruby_cref*/73247325intflags;/*scope_vmode/rb_trap_immediate/raised*/73267327NODE*node;/*rb_current_node*/73287329inttracing;/*tracing*/7330VALUEerrinfo;/*$!*/7331VALUElast_status;/*$?*/7332VALUElast_line;/*$_*/
7333VALUElast_match;/*$~*/73347335intsafe;/*ruby_safe_level*/
(eval.c)
Asshownabove,therearethemembersthatseemtocorrespondtoruby_frameandruby_scope.There’salsoajmp_buftosavetheregisters.
Then,theproblemisthemachinestack.Howcanwesubstitutethem?
Thewaywhichisthemoststraightforwardforthemechanismisdirectlywritingoverthepointertotheposition(end)ofthestack.Usually,itisintheCPUregisters.Sometimesitisaspecificregister,anditisalsopossiblethatageneral-purposeregisterisallocatedforit.Anyway,itisinsomewhere.Forconvenience,we’llcallitthestackpointerfromnowon.Itisobviousthatthedifferentspacecanbeusedasthestackbymodifyingit.ButitisalsoobviousinthiswaywehavetodealwithitforeachCPUandforeachOS,thusitisreallyhardtoservethepotability.
Therefore,rubyusesaveryviolentwaytoimplementthesubstitutionofthemachinestack.Thatis,ifwecan’tmodifythestackpointer,let’smodifytheplacethestackpointerpointsto.Weknowthestackcanbedirectlymodifiedaswe’veseeninthedescriptionaboutthegarbagecollection,therestisslightlychangingwhattodo.Theplacetostorethestackproperlyexistsinstructthread.
▼structthread(partial)
7310intstk_len;/*thestacklength*/7311intstk_max;/*thesizeofmemoryallocatedforstk_ptr*/7312VALUE*stk_ptr;/*thecopyofthestack*/7313VALUE*stk_pos;/*thepositionofthestack*/
(eval.c)
HowtheexplanationgoesSofar,I’vetalkedaboutvariousthings,buttheimportantpointscanbesummarizedtothethree:
WhenTowhichthreadHow
toswitchcontext.Thesearealsothepointsofthischapter.Below,I’lldescribethemusingasectionforeachofthethreepointsrespectively.
Trigger
Tobeginwith,it’sthefirstpoint,whentoswitchthreads.Inotherwords,whatisthecauseofswitchingthreads.
WaitingI/OForexample,whentryingtoreadinsomethingbycallingIO#getsorIO#read,sincewecanexpectitwilltakealotoftimetoread,it’sbettertoruntheotherthreadsinthemeantime.Inotherwords,aforcibleswitchbecomesnecessaryhere.Belowistheinterfaceofgetc.
▼rb_getc()
1185int1186rb_getc(f)1187FILE*f;1188{1189intc;11901191if(!READ_DATA_PENDING(f)){1192rb_thread_wait_fd(fileno(f));1193}1194TRAP_BEG;1195c=getc(f);1196TRAP_END;11971198returnc;1199}
(io.c)
READ_DATA_PENDING(f)isamacrotocheckifthecontentofthebufferofthefileisstillthere.Ifthere’sthecontentofthebuffer,itmeansitcanmovewithoutanywaitingtime,thusitwouldreaditimmediately.Ifitwasempty,itmeansitwouldtakesometime,thusitwouldrb_thread_wait_fd().Thisisanindirectcauseofswitchingthreads.
Ifrb_thread_wait_fd()is“indirect”,therealsoshouldbea“direct”cause.Whatisit?Let’sseetheinsideofrb_thread_wait_fd().
▼rb_thread_wait_fd()
8047void8048rb_thread_wait_fd(fd)8049intfd;8050{8051if(rb_thread_critical)return;8052if(curr_thread==curr_thread->next)return;8053if(curr_thread->status==THREAD_TO_KILL)return;80548055curr_thread->status=THREAD_STOPPED;8056curr_thread->fd=fd;8057curr_thread->wait_for=WAIT_FD;8058rb_thread_schedule();8059}
(eval.c)
There’srb_thread_schedule()atthelastline.Thisfunctionisthe“directcause”.Itistheheartoftheimplementationoftherubythreads,anddoesselectandswitchtothenextthread.
Whatmakesusunderstandthisfunctionhassuchroleis,inmycase,Iknewtheword“scheduling”ofthreadsbeforehand.Evenifyoudidn’tknow,becauseyouremembersnow,you’llbeabletonoticeitatthenexttime.
And,inthiscase,itdoesnotmerelypassthecontroltotheotherthread,butitalsostopsitself.Moreover,ithasanexplicitdeadlinethatis“bythetimewhenitbecomesreadable”.Therefore,this
requestshouldbetoldtorb_thread_schedule().Thisistheparttoassignvariousthingstothemembersofcurr_thread.Thereasontostopisstoredinwait_for,theinformationtobeusedwhenwakingupisstoredinfd,respectively.
WaitingtheotherthreadAfterunderstandingthreadsareswitchedatthetimingofrb_thread_schedule(),thistime,conversely,fromtheplacewhererb_thread_schedule()appears,wecanfindtheplaceswherethreadsareswitched.Thenbyscanning,Ifounditinthefunctionnamedrb_thread_join().
▼rb_thread_join()(partial)
8227staticint8228rb_thread_join(th,limit)8229rb_thread_tth;8230doublelimit;8231{
8243curr_thread->status=THREAD_STOPPED;8244curr_thread->join=th;8245curr_thread->wait_for=WAIT_JOIN;8246curr_thread->delay=timeofday()+limit;8247if(limit<DELAY_INFTY)curr_thread->wait_for|=WAIT_TIME;8248rb_thread_schedule();
(eval.c)
ThisfunctionisthesubstanceofThread#join,andThread#joinisamethodtowaituntilthereceiverthreadwillend.Indeed,since
there’stimetowait,runningtheotherthreadsiseconomy.Becauseofthis,thesecondreasontoswitchisfound.
WaitingForTimeMoreover,alsointhefunctionnamedrb_thread_wait_for(),rb_thread_schedule()wasfound.Thisisthesubstanceof(Ruby’s)sleepandsuch.
▼rb_thread_wait_for(simplified)
8080void8081rb_thread_wait_for(time)8082structtimevaltime;8083{8084doubledate;
8124date=timeofday()+(double)time.tv_sec+(double)time.tv_usec*1e-6;8125curr_thread->status=THREAD_STOPPED;8126curr_thread->delay=date;8127curr_thread->wait_for=WAIT_TIME;8128rb_thread_schedule();8129}
(eval.c)
timeofday()returnsthecurrenttime.Becausethevalueoftimeisaddedtoit,dateindicatesthetimewhenthewaitingtimeisover.Inotherwords,thisistheorder“I’dliketostopuntilitwillbethespecifictime”.
Switchbyexpirations
Intheaboveallcases,becausesomemanipulationsaredonefromRubylevel,consequentlyitcausestoswitchthreads.Inotherwords,bynow,theRuby-levelisalsonon-preemptive.Onlybythis,ifaprogramwastosingle-mindedlykeepcalculating,aparticularthreadwouldcontinuetoruneternally.Therefore,weneedtoletitvoluntarydisposethecontrolrightafterrunningforawhile.Then,howlongathreadcanrunbythetimewhenitwillhavetostop,iswhatI’lltalkaboutnext.
setitimer
Sinceitisthesameeverynowandthen,Ifeellikelackingtheskilltoentertain,butIsearchedtheplaceswherecallingrb_thread_schedule()further.Andthistimeitwasfoundinthestrangeplace.Itishere.
▼catch_timer()
8574staticvoid8575catch_timer(sig)8576intsig;8577{8578#if!defined(POSIX_SIGNAL)&&!defined(BSD_SIGNAL)8579signal(sig,catch_timer);8580#endif8581if(!rb_thread_critical){8582if(rb_trap_immediate){8583rb_thread_schedule();8584}8585elserb_thread_pending=1;8586}8587}
(eval.c)
Thisseemssomethingrelatingtosignals.Whatisthis?Ifollowedtheplacewherethiscatch_timer()functionisused,thenitwasusedaroundhere:
▼rb_thread_start_0()(partial)
8620staticVALUE8621rb_thread_start_0(fn,arg,th_arg)8622VALUE(*fn)();8623void*arg;8624rb_thread_tth_arg;8625{
8632#ifdefined(HAVE_SETITIMER)8633if(!thread_init){8634#ifdefPOSIX_SIGNAL8635posix_signal(SIGVTALRM,catch_timer);8636#else8637signal(SIGVTALRM,catch_timer);8638#endif86398640thread_init=1;8641rb_thread_start_timer();8642}8643#endif
(eval.c)
Thismeans,catch_timerisasignalhandlerofSIGVTALRM.
Here,“whatkindofsignalSIGVTALRMis”becomesthequestion.Thisisactuallythesignalsentwhenusingthesystemcallnamedsetitimer.That’swhythere’sacheckofHAVE_SETITIMERjustbeforeit.setitimerisanabbreviationof“SETIntervalTIMER”anda
systemcalltotellOStosendsignalswithacertaininterval.
Then,whereistheplacecallingsetitimer?Itistherb_thread_start_timer(),whichiscoincidentlylocatedatthelastofthislist.
Tosumupall,itbecomesthefollowingscenario.setitimerisusedtosendsignalswithacertaininterval.Thesignalsarecaughtbycatch_timer().There,rb_thread_schedule()iscalledandthreadsareswitched.Perfect.
However,signalscouldoccuranytime,ifitwasbasedononlywhatdescribeduntilhere,itmeansitwouldalsobepreemptiveatClevel.Then,I’dlikeyoutoseethecodeofcatch_timer()again.
if(rb_trap_immediate){rb_thread_schedule();}elserb_thread_pending=1;
There’sarequiredconditionthatisdoingrb_thread_schedule()onlywhenitisrb_trap_immediate.Thisisthepoint.rb_trap_immediateis,asthenamesuggests,expressing“whetherornotimmediatelyprocesssignals”,anditisusuallyfalse.ItbecomestrueonlywhilethelimitedtimesuchaswhiledoingI/Oonasinglethread.Inthesourcecode,itisthepartbetweenTRAP_BEGandTRAP_END.
Ontheotherhand,sincerb_thread_pendingissetwhenitisfalse,let’sfollowthis.Thisvariableisusedinthefollowingplace.
▼CHECK_INTS−HAVE_SETITIMER
73#ifdefined(HAVE_SETITIMER)&&!defined(__BOW__)74EXTERNintrb_thread_pending;75#defineCHECK_INTSdo{\76if(!rb_prohibit_interrupt){\77if(rb_trap_pending)rb_trap_exec();\78if(rb_thread_pending&&!rb_thread_critical)\79rb_thread_schedule();\80}\81}while(0)
(rubysig.h)
Thisway,insideofCHECK_INTS,rb_thread_pendingischeckedandrb_thread_schedule()isdone.Itmeans,whenreceivingSIGVTALRM,rb_thread_pendingbecomestrue,thenthethreadwillbeswitchedatthenexttimegoingthroughCHECK_INTS.
ThisCHECK_INTShasappearedatvariousplacesbynow.Forexample,rb_eval()andrb_call0()andrb_yeild_0.CHECK_INTSwouldbemeaninglessifitwasnotlocatedwheretheplacefrequentlybeingpassed.Therefore,itisnaturaltoexistintheimportantfunctions.
tick
Weunderstoodthecasewhenthere’ssetitimer.Butwhatifsetitimerdoesnotexist?Actually,theanswerisinCHECK_INTS,whichwe’vejustseen.Itisthedefinitionofthe#elseside.
▼CHECK_INTS−notHAVE_SETITIMER
84EXTERNintrb_thread_tick;85#defineTHREAD_TICK50086#defineCHECK_INTSdo{\87if(!rb_prohibit_interrupt){\88if(rb_trap_pending)rb_trap_exec();\89if(!rb_thread_critical){\90if(rb_thread_tick--<=0){\91rb_thread_tick=THREAD_TICK;\92rb_thread_schedule();\93}\94}\95}\96}while(0)
(rubysig.h)
EverytimegoingthroughCHECK_INTS,decrementrb_thread_tick.Whenitbecomes0,dorb_thread_schedule().Inotherwords,themechanismisthatthethreadwillbeswitchedafterTHREAD_TICK(=500)timesgoingthroughCHECK_INTS.
Scheduling
Thesecondpointistowhichthreadtoswitch.Whatsolelyresponsibleforthisdecisionisrb_thread_schedule().
rb_thread_schedule()
Theimportantfunctionsofrubyarealwayshuge.This
rb_thread_schedule()hasmorethan220lines.Let’sexhaustivelydivideitintoportions.
▼rb_thread_schedule()(outline)
7819void7820rb_thread_schedule()7821{7822rb_thread_tnext;/*OK*/7823rb_thread_tth;7824rb_thread_tcurr;7825intfound=0;78267827fd_setreadfds;7828fd_setwritefds;7829fd_setexceptfds;7830structtimevaldelay_tv,*delay_ptr;7831doubledelay,now;/*OK*/7832intn,max;7833intneed_select=0;7834intselect_timeout=0;78357836rb_thread_pending=0;7837if(curr_thread==curr_thread->next7838&&curr_thread->status==THREAD_RUNNABLE)7839return;78407841next=0;7842curr=curr_thread;/*startingthread*/78437844while(curr->status==THREAD_KILLED){7845curr=curr->prev;7846}
/*……preparethevariablesusedatselect……*//*……selectifnecessary……*//*……decidethethreadtoinvokenext……*//*……context-switch……*/8045}
(eval.c)
(A)Whenthere’sonlyonethread,thisdoesnotdoanythingandreturnsimmediately.Therefore,thetalksafterthiscanbethoughtbasedontheassumptionthattherearealwaysmultiplethreads.
(B)Subsequently,theinitializationofthevariables.Wecanconsiderthepartuntilandincludingthewhileistheinitialization.Sincecurisfollowingprev,thelastalivethread(status!=THREAD_KILLED)willbeset.Itisnot“thefirst”onebecausetherearealotofloopsthat“startwiththenextofcurrthendealwithcurrandend”.
Afterthat,wecanseethesentencesaboutselect.Sincethethreadswitchofrubyisconsiderablydependingonselect,let’sfirststudyaboutselectinadvancehere.
select
selectisasystemcalltowaituntilthepreparationforreadingorwritingacertainfilewillbecompleted.Itsprototypeisthis:
intselect(intmax,fd_set*readset,fd_set*writeset,fd_set*exceptset,structtimeval*timeout);
Inthevariableoftypefd_set,asetoffdthatwewanttocheckisstored.Thefirstargumentmaxis“(themaximumvalueoffdinfd_set)+1”.Thetimeoutisthemaximumwaitingtimeofselect.IftimeoutisNULL,itwouldwaiteternally.Iftimeoutis0,without
waitingforevenjustasecond,itwouldonlycheckandreturnimmediately.Asforthereturnvalue,I’lltalkaboutitatthemomentwhenusingit.
I’lltalkaboutfd_setindetail.fd_setcanbemanipulatedbyusingthebelowmacros:
▼fd_setmaipulation
fd_setset;
FD_ZERO(&set)/*initialize*/FD_SET(fd,&set)/*addafiledescriptorfdtotheset*/FD_ISSET(fd,&set)/*trueiffdisintheset*/
fd_setistypicallyabitarray,andwhenwewanttocheckn-thfiledescriptor,then-thbitisset(Figure2).
Figure2:fd_set
I’llshowasimpleusageexampleofselect.
▼ausageexmpleofselect
#include<stdio.h>#include<sys/types.h>#include<sys/time.h>#include<unistd.h>
intmain(intargc,char**argv){char*buf[1024];fd_setreadset;
FD_ZERO(&readset);/*initializereadset*/FD_SET(STDIN_FILENO,&readset);/*putstdinintotheset*/select(STDIN_FILENO+1,&readset,NULL,NULL,NULL);read(STDIN_FILENO,buf,1024);/*successwithoutdelay*/exit(0);}
Thiscodeassumethesystemcallisalwayssuccess,thustherearenotanyerrorchecksatall.I’dlikeyoutoseeonlytheflowthatisFD_ZERO→FD_SET→select.SinceherethefifthargumenttimeoutofselectisNULL,thisselectcallwaitseternallyforreadingstdin.Andsincethisselectiscompleted,thenextreaddoesnothavetowaittoreadatall.Byputtingprintinthemiddle,youwillgetfurtherunderstandingsaboutitsbehavior.AndalittlemoredetailedexamplecodeisputintheattachedCD-ROM{seealsodoc/select.html}.
PreparationsforselectNow,we’llgobacktothecodeofrb_thread_schedule().Sincethiscodebranchesbasedonthereasonwhythreadsarewaiting.I’llshowthecontentinshortenedform.
▼rb_thread_schedule()−preparationsforselect
7848again:/*initializethevariablesrelatingtoselect*/7849max=-1;7850FD_ZERO(&readfds);7851FD_ZERO(&writefds);7852FD_ZERO(&exceptfds);7853delay=DELAY_INFTY;7854now=-1.0;78557856FOREACH_THREAD_FROM(curr,th){7857if(!found&&th->status<=THREAD_RUNNABLE){7858found=1;7859}7860if(th->status!=THREAD_STOPPED)continue;7861if(th->wait_for&WAIT_JOIN){/*……joinwait……*/7866}7867if(th->wait_for&WAIT_FD){/*……I/Owait……*/7871}7872if(th->wait_for&WAIT_SELECT){/*……selectwait……*/7882}7883if(th->wait_for&WAIT_TIME){/*……timewait……*/7899}7900}7901END_FOREACH_FROM(curr,th);
(eval.c)
Whetheritissupposedtobeornot,whatstandoutarethemacrosnamedFOREACH-some.Thesetwoaredefinedasfollows:
▼FOREACH_THREAD_FROM
7360#defineFOREACH_THREAD_FROM(f,x)x=f;do{x=x->next;7361#defineEND_FOREACH_FROM(f,x)}while(x!=f)
(eval.c)
Let’sextractthemforbetterunderstandability.
th=curr;do{th=th->next;{.....}}while(th!=curr);
Thismeans:followthecircularlistofthreadsfromthenextofcurrandprocesscurratlastandend,andmeanwhilethethvariableisused.ThismakesmethinkabouttheRuby’siterators…isthismytoomuchimagination?
Here,we’llgobacktothesubsequenceofthecode,itusesthisabitstrangeloopandchecksifthere’sanythreadwhichneedsselect.Aswe’veseenpreviously,sinceselectcanwaitforreading/writing/exception/timeallatonce,youcanprobablyunderstandI/Owaitsandtimewaitscanbecentralizedbysingleselect.AndthoughIdidn’tdescribeaboutitintheprevioussection,selectwaitsarealsopossible.There’salsoamethodnamedIO.selectintheRuby’slibrary,andyoucanuserb_thread_select()atClevel.Therefore,weneedtoexecutethatselectatthesametime.Bymergingfd_set,multipleselectcanbedoneatonce.
Therestisonlyjoinwait.Asforitscode,let’sseeitjustincase.
▼rb_thread_schedule()−selectpreparation−joinwait
7861if(th->wait_for&WAIT_JOIN){7862if(rb_thread_dead(th->join)){7863th->status=THREAD_RUNNABLE;7864found=1;7865}7866}
(eval.c)
Themeaningofrb_thread_dead()isobviousbecauseofitsname.Itdetermineswhetherornotthethreadoftheargumenthasfinished.
CallingselectBynow,we’vefiguredoutwhetherselectisnecessaryornot,andifitisnecessary,itsfd_sethasalreadyprepared.Evenifthere’saimmediatelyinvocablethread(THREAD_RUNNABLE),weneedtocallselectbeforehand.It’spossiblethatthere’sactuallyathreadthatithasalreadybeenwhilesinceitsI/Owaitfinishedandhasthehigherpriority.Butinthatcase,tellselecttoimmediatelyreturnandletitonlycheckifI/Owascompleted.
▼rb_thread_schedule()−select
7904if(need_select){7905/*convertdelayintotimeval*/7906/*iftheresimmediatelyinvocablethreads,doonlyI/Ochecks*/7907if(found){7908delay_tv.tv_sec=0;
7909delay_tv.tv_usec=0;7910delay_ptr=&delay_tv;7911}7912elseif(delay==DELAY_INFTY){7913delay_ptr=0;7914}7915else{7916delay_tv.tv_sec=delay;7917delay_tv.tv_usec=(delay-(double)delay_tv.tv_sec)*1e6;7918delay_ptr=&delay_tv;7919}79207921n=select(max+1,&readfds,&writefds,&exceptfds,delay_ptr);7922if(n<0){/*……beingcutinbysignalorsomething……*/7944}7945if(select_timeout&&n==0){/*……timeout……*/7960}7961if(n>0){/*……properlyfinished……*/7989}7990/*Inasomewherethread,itsI/Owaithasfinished.7991rolltheloopagaintodetectthethread*/7992if(!found&&delay!=DELAY_INFTY)7993gotoagain;7994}
(eval.c)
Thefirsthalfoftheblockisaswritteninthecomment.Sincedelayistheusecuntiltheanythreadwillbenextinvocable,itisconvertedintotimevalform.
Inthelasthalf,itactuallycallsselectandbranchesbasedonitsresult.Sincethiscodeislong,Idivideditagain.Whenbeingcutinbyasignal,iteithergoesbacktothebeginningthenprocessesagainorbecomesanerror.Whataremeaningfularetheresttwo.
TimeoutWhenselectistimeout,athreadoftimewaitorselectwaitmaybecomeinvocable.Checkaboutitandsearchrunnablethreads.Ifitisfound,setTHREAD_RUNNABLEtoit.
CompletingnormallyIfselectisnormallycompleted,itmeanseitherthepreparationforI/Oiscompletedorselectwaitends.Searchthethreadsthatarenolongerwaitingbycheckingfd_set.Ifitisfound,setTHREAD_RUNNABLEtoit.
DecidethenextthreadTakingalltheinformationintoconsiderations,eventuallydecidethenextthreadtoinvoke.SinceallwhatwasinvocableandallwhathadfinishedwaitingandsoonbecameRUNNABLE,youcanarbitrarypickuponeofthem.
▼rb_thread_schedule()−decidethenextthread
7996FOREACH_THREAD_FROM(curr,th){7997if(th->status==THREAD_TO_KILL){/*(A)*/7998next=th;7999break;8000}8001if(th->status==THREAD_RUNNABLE&&th->stk_ptr){8002if(!next||next->priority<th->priority)/*(B)*/8003next=th;8004}
8005}8006END_FOREACH_FROM(curr,th);
(eval.c)
(A)ifthere’sathreadthatisabouttofinish,giveitthehighpriorityandletitfinish.
(B)findoutwhatseemsrunnable.Howeveritseemstoconsiderthevalueofpriority.ThismembercanalsobemodifiedfromRubylevelbyusingTread#priorityThread#priority=.rubyitselfdoesnotespeciallymodifyit.
Ifthesearedonebutthenextthreadcouldnotbefound,inotherwordsifthenextwasnotset,whathappen?Sinceselecthasalreadybeendone,atleastoneofthreadsoftimewaitorI/Owaitshouldhavefinishedwaiting.Ifitwasmissing,therestisonlythewaitsfortheotherthreads,andmoreoverthere’snorunnablethreads,thusthiswaitwillneverend.Thisisadeadlock.
Ofcourse,fortheotherreasons,adeadlockcanhappen,butgenerallyit’sveryhardtodetectadeadlock.Especiallyinthecaseofruby,MutexandsuchareimplementedatRubylevel,theperfectdetectionisnearlyimpossible.
SwitchingThreadsThenextthreadtoinvokehasbeendetermined.I/Oandselectcheckshasalsobeendone.Therestistransferringthecontroltothetargetthread.However,forthelastofrb_thread_schedule()and
thecodetoswitchthreads,I’llstartanewsection.
ContextSwitch
Thelastthirdpointisthread-switch,anditiscontext-switch.Thisisthemostinterestingpartofthreadsofruby.
TheBaseLineThenwe’llstartwiththetailofrb_thread_schedule().Sincethestoryofthissectionisverycomplex,I’llgowithasignificantlysimplifiedversion.
▼rb_thread_schedule()(contextswitch)
if(THREAD_SAVE_CONTEXT(curr)){return;}rb_thread_restore_context(next,RESTORE_NORMAL);
AsforthepartofTHREAD_SAVE_CONTEXT(),weneedtoextractthecontentatseveralplacesinordertounderstand.
▼THREAD_SAVE_CONTEXT()
7619#defineTHREAD_SAVE_CONTEXT(th)\7620(rb_thread_save_context(th),thread_switch(setjmp((th)->context)))
7587staticint7588thread_switch(n)7589intn;7590{7591switch(n){7592case0:7593return0;7594caseRESTORE_FATAL:7595JUMP_TAG(TAG_FATAL);7596break;7597caseRESTORE_INTERRUPT:7598rb_interrupt();7599break;/*……processvariousabnormalthings……*/7612caseRESTORE_NORMAL:7613default:7614break;7615}7616return1;7617}
(eval.c)
IfImergethethreethenextractit,hereistheresult:
rb_thread_save_context(curr);switch(setjmp(curr->context)){case0:break;caseRESTORE_FATAL:....caseRESTORE_INTERRUPT:..../*……processabnormals……*/caseRESTORE_NORMAL:default:return;}rb_thread_restore_context(next,RESTORE_NORMAL);
Atbothofthereturnvalueofsetjmp()andrb_thread_restore_context(),RESTORE_NORMALappears,thisisclearlysuspicious.Sinceitdoeslongjmp()inrb_thread_restore_context(),wecanexpectthecorrespondencebetweensetjmp()andlongjmp().Andifwewillimaginethemeaningalsofromthefunctionnames,
savethecontextofthecurrentthreadsetjmprestorethecontextofthenextthreadlongjmp
Theroughmainflowwouldprobablylooklikethis.Howeverwhatwehavetobecarefulabouthereis,thispairofsetjmp()andlongjmp()isnotcompletedinthisthread.setjmp()isusedtosavethecontextofthisthread,longjmp()isusedtorestorethecontextofthenextthread.Inotherwords,there’sachainofsetjmp/longjmp()asfollows.(Figure3)
Figure3:thebackstitchbychainingofsetjmp
WecanrestorearoundtheCPUregisterswithsetjmp()/longjmp(),sotheremainingcontextistheRubystacksinadditiontothemachinestack.rb_thread_save_context()istosaveit,andrb_thread_restore_context()istorestoreit.Let’slookateachoftheminsequentialorder.
rb_thread_save_context()
Now,we’llstartwithrb_thread_save_context(),whichsavesacontext.
▼rb_thread_save_context()(simplified)
7539staticvoid7540rb_thread_save_context(th)7541rb_thread_tth;
7542{7543VALUE*pos;7544intlen;7545staticVALUEtval;75467547len=ruby_stack_length(&pos);7548th->stk_len=0;7549th->stk_pos=(rb_gc_stack_start<pos)?rb_gc_stack_start7550:rb_gc_stack_start-len;7551if(len>th->stk_max){7552REALLOC_N(th->stk_ptr,VALUE,len);7553th->stk_max=len;7554}7555th->stk_len=len;7556FLUSH_REGISTER_WINDOWS;7557MEMCPY(th->stk_ptr,th->stk_pos,VALUE,th->stk_len);
/*…………omission…………*/}
(eval.c)
Thelasthalfisjustkeepassigningtheglobalvariablessuchasruby_scopeintoth,soitisomittedbecauseitisnotinteresting.Therest,inthepartshownabove,itattemptstocopytheentiremachinestackintotheplacewhereth->stk_ptrpointsto.
First,itisruby_stack_length()whichwritestheheadaddressofthestackintotheparameterposandreturnsitslength.Therangeofthestackisdeterminedbyusingthisvalueandtheaddressofthebottom-endsideissettoth->stk_ptr.Wecanseesomebranches,itisbecausebothastackextendinghigherandastackextendinglowerarepossible.(Figure4)
Fig.4:astackextendingaboveandastackextendingbelow
Afterthat,therestisallocatingamemoryinwhereth->stkptrpointstoandcopyingthestack:allocatethememorywhosesizeisth->stk_maxthencopythestackbythelenlength.
FLUSH_REGISTER_WINDOWSwasdescribedinChapter5:Garbagecollection,soitsexplanationmightnolongerbenecessary.Thisisamacro(whosesubstanceiswritteninAssembler)towritedownthecacheofthestackspacetothememory.Itmustbecalledwhenthetargetistheentirestack.
rb_thread_restore_context()
Andfinally,itisrb_thread_restore_context(),whichisthefunctiontorestoreathread.
▼rb_thread_restore_context()
7635staticvoid7636rb_thread_restore_context(th,exit)7637rb_thread_tth;7638intexit;7639{7640VALUEv;7641staticrb_thread_ttmp;7642staticintex;7643staticVALUEtval;76447645if(!th->stk_ptr)rb_bug("unsavedcontext");76467647if(&v<rb_gc_stack_start){7648/*themachinestackextendinglower*/7649if(&v>th->stk_pos)stack_extend(th,exit);7650}7651else{7652/*themachinestackextendinghigher*/7653if(&v<th->stk_pos+th->stk_len)stack_extend(th,exit);7654}
/*omission……backtheglobalvariables*/
7677tmp=th;7678ex=exit;7679FLUSH_REGISTER_WINDOWS;7680MEMCPY(tmp->stk_pos,tmp->stk_ptr,VALUE,tmp->stk_len);76817682tval=rb_lastline_get();7683rb_lastline_set(tmp->last_line);7684tmp->last_line=tval;7685tval=rb_backref_get();7686rb_backref_set(tmp->last_match);
7687tmp->last_match=tval;76887689longjmp(tmp->context,ex);7690}
(eval.c)
Thethparameteristhetargettogivetheexecutionback.MEMCPY()andlongjmp()inthelasthalfareattheheart.ThecloserMEMCPY()tothelast,thebetteritis,becauseafterthismanipulation,thestackisinadestroyedstateuntillongjmp().
Nevertheless,therearerb_lastline_set()andrb_backref_set().Theyaretherestorationsof$_and$~.Sincethesetwovariablesarenotonlylocalvariablesbutalsothreadlocalvariables,evenifitisonlyasinglelocalvariableslot,thereareitsasmanyslotsasthenumberofthreads.Thismustbeherebecausetheplaceactuallybeingwrittenbackisthestack.Becausetheyarelocalvariables,theirslotspacesareallocatedwithalloca().
That’sitforthebasics.Butifwemerelywritethestackback,inthecasewhenthestackofthecurrentthreadisshorterthanthestackofthethreadtoswitchto,thestackframeoftheverycurrentlyexecutingfunction(itisrb_thread_restore_context)wouldbeoverwritten.Itmeansthecontentofthethparameterwillbedestroyed.Therefore,inordertopreventthisfromoccurring,wefirstneedtoextendthestack.Thisisdonebythestack_extend()inthefirsthalf.
▼stack_extend()
7624staticvoid7625stack_extend(th,exit)7626rb_thread_tth;7627intexit;7628{7629VALUEspace[1024];76307631memset(space,0,1);/*preventarrayfromoptimization*/7632rb_thread_restore_context(th,exit);7633}
(eval.c)
Byallocatingalocalvariable(whichwillbeputatthemachinestackspace)whosesizeis1K,forciblyextendthestack.However,thoughthisisamatterofcourse,doingreturnfromstack_extend()meanstheextendedstackwillshrinkimmediately.Thisiswhyrb_thread_restore_context()iscalledagainimmediatelyintheplace.
Bytheway,thecompletionofthetaskofrb_thread_restore_context()meansithasreachedthecalloflongjmp(),andonceitiscalleditwillneverreturnback.Obviously,thecallofstack_extend()willalsoneverreturn.Therefore,rb_thread_restore_context()doesnothavetothinkaboutsuchaspossibleproceduresafterreturningfromstack_extend().
IssuesThisistheimplementationoftherubythreadswitch.Wecan’tthinkitislightweight.Plentyofmalloc()realloc()andplentyof
memcpy()anddoingsetjmp()longjmp()thenfurthermorecallingfunctionstoextendthestack.There’snoproblemtoexpress“Itisdeadlyheavy”.Butinstead,there’snotanysystemcalldependingonaparticularOS,andtherearejustafewassemblyonlyfortheregisterwindowsofSparc.Indeed,thisseemstobehighlyportable.
There’sanotherproblem.Itis,becausethestacksofallthreadsareallocatedtothesameaddress,there’sthepossibilitythatthecodeusingthepointertothestackspaceisnotrunnable.Actually,Tcl/Tkexcellentlymatchesthissituation,inordertobypass,Ruby’sTcl/Tkinterfacereluctantlychosestoaccessonlyfromthemainthread.
Ofcourse,thisdoesnotgoalongwithnativethreads.Itwouldbenecessarytorestrictrubythreadstorunonlyonaparticularnativethreadinordertoletthemworkproperly.InUNIX,therearestillafewlibrariesthatusealotofthreads.ButinWin32,becausethreadsarerunningeverynowandthen,weneedtobecarefulaboutit.
TheoriginalworkisCopyright©2002-2004MineroAOKI.TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera
CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License
RubyHackingGuide
FinalChapter:Ruby’s
future
Issuestobeaddressed
rubyisn’t‘completelyfinishedsoftware’。It’sstillbeingdeveloped,therearestillalotofissues.Firstly,wewanttotryremovinginherentproblemsinthecurrentinterpreter.
Theorderofthetopicsismostlyinthesameorderasthechaptersofthisbook.
PerformanceofGCTheperformanceofthecurrentGCmightbe“notnotablybad,butnotnotablygood”.“notnotablybad”means“itwon’tcausetroublesinourdailylife”,and“notnotablygood”means“itsdownsidewillbeexposedunderheavyload”.Forexample,ifitisanapplication
whichcreatesplentyofobjectsandkeepsholdingthem,itsspeedwouldslowdownradically.EverytimedoingGC,itneedstomarkalloftheobjects,andfurthermoreitwouldbecomestoneedtoinvokeGCmoreoftenbecauseitcan’tcollectthem.Tocounterthisproblem,GenerationalGC,whichwasmentionedinChapter5,mustbeeffective.(Atleast,itissaidsointheory.)
Alsoregardingitsresponsespeed,therearestillroomswecanimprove.WiththecurrentGC,whileitisrunning,theentireinterpretorstops.Thus,whentheprogramisaneditororaGUIapplication,sometimesitfreezesandstopstoreact.Evenifit’sjust0.1second,stoppingwhentypingcharacterswouldgiveaverybadimpression.Currently,therearefewsuchapplicationscreatedor,evenifexists,itssizemightbeenoughsmallnottoexposethisproblem.However,ifsuchapplicationwillactuallybecreatedinthefuture,theremightbethenecessitytoconsiderIncrementalGC.
ImplementationofparserAswesawinPart2,theimplementationofrubyparserhasalreadyutilized@yacc@’sabilitytoalmostitslimit,thusIcan’tthinkitcanendurefurtherexpansions.It’sallrightifthere’snothingplannedtoexpand,butabigname“keywordargument”isplannednextandit’ssadifwecouldnotexpressanotherdemandedgrammarbecauseofthelimitationofyacc.
Reuseofparser
Ruby’sparserisverycomplex.Inparticular,dealingwitharoundlex_stateseriouslyisveryhard.Duetothis,embeddingaRubyprogramorcreatingaprogramtodealwithaRubyprogramitselfisquitedifficult.
Forexample,I’mdevelopingatoolnamedracc,whichisprefixedwithRbecauseitisaRuby-versionyacc.Withracc,thesyntaxofgrammarfilesarealmostthesameasyaccbutwecanwriteactionsinRuby.Todoso,itcouldnotdeterminetheendofanactionwithoutparsingRubycodeproperly,but“properly”isverydifficult.Sincethere’snootherchoice,currentlyI’vecompromisedatthelevelthatitcanparse“almostall”.
AsanotherexamplewhichrequiresanalyzingRubyprogram,Icanenumeratesometoolslikeindentandlint,butcreatingsuchtoolalsorequiresalotefforts.Itwouldbedesperateifitissomethingcomplexlikearefactoringtool.
Then,whatcanwedo?Ifwecan’trecreatethesamething,whatif@ruby@’soriginalparsercanbeusedasacomponent?Inotherwords,makingtheparseritselfalibrary.Thisisafeaturewewantbyallmeans.
However,whatbecomesproblemhereis,aslongasyaccisused,wecannotmakeparserreentrant.Itmeans,say,wecannotcallyyparse()recursively,andwecannotcallitfrommultiplethreads.Therefore,itshouldbeimplementedinthewayofnotreturning
controltoRubywhileparsing.
HidingCodeWithcurrentruby,itdoesnotworkwithoutthesourcecodeoftheprogramtorun.Thus,peoplewhodon’twantotherstoreadtheirsourcecodemighthavetrouble.
InterpretorObjectCurrentlyeachprocesscannothavemultiplerubyinterpretors,thiswasdiscussedinChapter13.Ifhavingmultipleinterpretorsispracticallypossible,itseemsbetter,butisitpossibletoimplementsuchthing?
ThestructureofevaluatorCurrenteval.cis,aboveall,toocomplex.EmbeddingRuby’sstackframestomachinestackcouldoccasionallybecomethesourceoftrouble,usingsetjmp()longjmp()aggressivelymakesitlesseasytounderstandandslowsdownitsspeed.ParticularlywithRISCmachine,whichhasmanyregisters,usingsetjmp()aggressivelycaneasilycauseslowingdownbecausesetjmp()setasideallthingsinregisters.
Theperformanceofevaluatorrubyisalreadyenoughfastforordinaryuse.Butasidefromit,
regardingalanguageprocessor,definitelythefasteristhebetter.Toachievebetterperformance,inotherwordstooptimize,whatcanwedo?Insuchcase,thefirstthingwehavetodoisprofiling.SoIprofiled.
%cumulativeselfselftotaltimesecondssecondscallsms/callms/callname20.251.641.6426383590.000.00rb_eval12.472.651.0111139470.000.00ruby_re_match8.893.370.7255192490.000.00rb_call06.543.900.5321563870.000.00st_lookup6.304.410.5115990960.000.00rb_yield_05.434.850.4455192490.000.00rb_call5.195.270.423880660.000.00st_foreach3.465.550.2886058660.000.00rb_gc_mark2.225.730.1838195880.000.00call_cfunc
ThisisaprofilewhenrunningsomeapplicationbutthisisapproximatelytheprofileofageneralRubyprogram.rb_eval()appearedintheoverwhelmingpercentagebeingatthetop,afterthat,inadditiontofunctionsofGC,evaluatorcore,functionsthatarespecifictotheprogramaremixed.Forexample,inthecaseofthisapplication,ittakesalotoftimeforregularexpressionmatch(ruby_re_match).
However,evenifweunderstoodthis,thequestionishowtoimproveit.Tothinksimply,itcanbearchivedbymakingrb_eval()faster.Thatsaid,butasforrubycore,therearealmostnotanyroomwhichcanbeeasilyoptimized.Forinstance,apparently“tailrecursive→gotoconversion”usedintheplaceofNODE_IFandothershasalreadyappliedalmostallpossibleplacesitcanbe
applied.Inotherwords,withoutchangingthewayofthinkingfundamentally,there’snoroomtoimprove.
TheimplementationofthreadThiswasalsodiscussedinChapter19.Therearereallyalotofissuesabouttheimplementationofthecurrentruby’sthread.Particularly,itcannotmixwithnativethreadssobadly.Thetwogreatadvantagesof@ruby@’sthread,(1)highportability(2)thesamebehavioreverywhere,aredefinitelyincomparable,butprobablythatimplementationissomethingwecannotcontinuetouseeternally,isn’tit?
ruby2
Subsequently,ontheotherhand,I’llintroducethetrendoftheoriginalruby,howitistryingtocountertheseissues.
RiteAtthepresenttime,ruby’sedgeis1.6.7asthestableversionand1.7.3asthedevelopmentversion,butperhapsthenextstableversion1.8willcomeoutinthenearfuture.Thenatthatpoint,thenextdevelopmentversion1.9.0willstartatthesametime.Andafterthat,thisisalittleirregularbut1.9.1willbethenextstableversion.
stable development whentostart1.6.x 1.7.x 1.6.0wasreleasedon2000-09-191.8.x 1.9.x probablyitwillcomeoutwithin6months1.9.1~ 2.0.0 maybeabout2yearslater
Andthenext-to-nextgenerationaldevelopmentversionisruby2,whosecodenameisRite.ApparentlythisnameindicatesarespectfortheinadequacythatJapanesecannotdistinguishthesoundsofLandR.
Whatwillbechangedin2.0is,inshort,almostalltheentirecore.Thread,evaluator,parser,allofthemwillbechanged.However,nothinghasbeenwrittenasacodeyet,sothingswrittenhereisentirelyjusta“plan”.Ifyouexpectsomuch,it’spossibleitwillturnoutdisappointments.Therefore,fornow,let’sjustexpectslightly.
ThelanguagetowriteFirstly,thelanguagetouse.DefinitelyitwillbeC.Mr.Matsumotosaidtoruby-talk,whichistheEnglishmailinglistforRuby,
IhateC++.
So,C++ismostunlikely.Evenifallthepartswillberecreated,itisreasonablethattheobjectsystemwillremainalmostthesame,sonottoincreaseextraeffortsaroundthisisnecessary.However,chancesaregoodthatitwillbeANSICnexttime.
GCRegardingtheimplementationofGC,thegoodstartpointwouldbeBoehmGC\footnote{BoehmGChttp://www.hpl.hp.com/personal/Hans_Boehm/gc}.BohemGCisaconservativeandincrementalandgenerationalGC,furthermore,itcanmarkallstackspacesofallthreadsevenwhilenativethreadsarerunning.It’sreallyanimpressiveGC.Evenifitisintroducedonce,it’shardtotellwhetheritwillbeusedperpetually,butanywayitwillproceedforthedirectiontowhichwecanexpectsomewhatimprovementonspeed.
ParserRegardingthespecification,it’sverylikelythatthenestedmethodcallswithoutparentheseswillbeforbidden.Aswe’veseen,command_callhasagreatinfluenceonalloverthegrammar.Ifthisissimplified,boththeparserandthescannerwillalsobesimplifiedalot.However,theabilitytoomitparenthesesitselfwillneverbedisabled.
Andregardingitsimplementation,whetherwecontinuetouseyaccisstillunderdiscussion.Ifwewon’tuse,itwouldmeanhand-writing,butisitpossibletoimplementsuchcomplexthingbyhand?Suchanxietymightleft.Whicheverwaywechoose,thepathmustbethorny.
Evaluator
Theevaluatorwillbecompletelyrecreated.Itsaimsaremainlytoimprovespeedandtosimplifytheimplementation.Therearetwomainviewpoints:
removerecursivecallslikerb_eval()switchtoabytecodeinterpretor
First,removingrecursivecallsofrb_eval().Thewaytoremoveis,maybethemostintuitiveexplanationisthatit’slikethe“tailrecursive→gotoconversion”.Insideasinglerb_eval(),circlingaroundbyusinggoto.Thatdecreasesthenumberoffunctioncallsandremovesthenecessityofsetjmp()thatisusedforreturnorbreak.However,whenafunctiondefinedinCiscalled,callingafunctionisinevitable,andatthatpointsetjmp()willstillberequired.
Bytecodeis,inshort,somethinglikeaprogramwritteninmachinelanguage.ItbecamefamousbecauseofthevirtualmachineofSmalltalk90,itiscalledbytecodebecauseeachinstructionisone-byte.Forthosewhoareusuallyworkingatmoreabstractlevel,bytewouldseemsonaturalbasisinsizetodealwith,butinmanycaseseachinstructionconsistsofbitsinmachinelanguages.Forexample,inAlpha,amonga32-bitinstructioncode,thebeginning6-bitrepresentstheinstructiontype.
Theadvantageofbytecodeinterpretorsismainlyforspeed.Therearetworeasons:Firstly,unlikesyntaxtrees,there’snoneedto
traversepointers.Secondly,it’seasytodopeepholeoptimization.
Andinthecasewhenbytecodeissavedandreadinlater,becausethere’snoneedtoparse,wecannaturallyexpectbetterperformance.However,parsingisaprocedurewhichisdoneonlyonceatthebeginningofaprogramandevencurrentlyitdoesnottakesomuchtime.Therefore,itsinfluencewillnotbesomuch.
Ifyou’dliketoknowabouthowthebytecodeevaluatorcouldbe,regex.cisworthtolookat.Foranotherexample,Pythonisabytecodeinterpretor.
ThreadRegardingthread,thethingisnativethreadsupport.Theenvironmentaroundthreadhasbeensignificantlyimproved,comparingwiththesituationin1994,theyearofRuby’sbirth.Soitmightbejudgedthatwecangetalongwithnativethreadnow.
UsingnativethreadmeansbeingpreemptivealsoatClevel,thustheinterpretoritselfmustbemulti-threadsafe,butitseemsthispointisgoingtobesolvedbyusingagloballockforthetimebeing.
Additionally,thatsomewhatarcane“continuation”,itseemslikelytoberemoved.ruby’scontinuationhighlydependsontheimplementationofthread,sonaturallyitwilldisappearifthreadisswitchedtonativethread.Theexistenceofthatfeatureisbecause“itcanbeimplemented”anditisrarelyactuallyused.Thereforetheremightbenoproblem.
M17NInaddition,I’dliketomentionafewthingsaboutclasslibraries.Thisisaboutmulti-lingualization(M17Nforshort).Whatitmeansexactlyinthecontextofprogrammingisbeingabletodealwithmultiplecharacterencodings.
rubywithMulti-lingualizationsupporthasalreadyimplementedandyoucanobtainitfromtheruby_m17mbranchoftheCVSrepository.Itisnotabsorbedyetbecauseitisjudgedthatitsspecificationisimmature.Ifgoodinterfacesisdesigned,itwillbeabsorbedatsomepointinthemiddleof1.9.
IOTheIOclassincurrentRubyisasimplewrapperofstdio,butinthisapproach,
therearetoomanybutslightdifferencesbetweenvariousplatforms.we’dliketohavefinercontrolonbuffers.
thesetwopointscausecomplaints.Therefore,itseemsRitewillhaveitsownstdio.
RubyHackingGuide
Sofar,we’vealwaysactedasobserverswholookatrubyfromoutside.But,ofcourse,rubyisnotaproductwhichdisplayedininashowcase.Itmeanswecaninfluenceitifwetakeanactionforit.Inthelastsectionofthisbook,I’llintroducethesuggestionsandactivitiesforrubyfromcommunity,asafarewellgiftforRubyHackersbothatpresentandinthefuture.
GenerationalGCFirst,asalsomentionedinChapter5,thegenerationalGCmadebyMr.KiyamaMasato.Asdescribedbefore,withthecurrentpatch,
itislessfastthanexpected.itneedstobeupdatedtofittheedgeruby
thesepointsareproblems,buthereI’dliketohighlyvalueitbecause,morethananythingelse,itwasthefirstlargenon-officialpatch.
OnigurumaTheregularexpressionengineusedbycurrentRubyisaremodeledversionofGNUregex.ThatGNUregexwasinthefirstplacewrittenforEmacs.Andthenitwasremodeledsothatitcansupportmulti-bytecharacters.AndthenMr.MatsumotoremodeledsothatitiscompatiblewithPerl.Aswecaneasilyimaginefromthishistory,itsconstructionisreallyintricateandspooky.Furthermore,duetotheLPGLlicenseofthisGNUregex,
thelicenseofrubyisverycomplicated,soreplacingthisenginehasbeenanissuefromalongtimeago.
Whatsuddenlyemergedhereistheregularexpressionengine“Oniguruma”byMr.K.Kosako.Iheardthisiswrittenreallywell,itislikelybeingabsorbedassoonaspossible.
YoucanobtainOnigurumafromtheruby’sCVSrepositoryinthefollowingway.
%cvs-d:pserver:[email protected]:/srccooniguruma
ripperNext,ripperismyproduct.Itisanextensionlibrarymadebyremodelingparse.y.Itisnotachangeappliedtotheruby’smainbody,butIintroducedithereasonepossibledirectiontomaketheparseracomponent.
Itisimplementedwithkindofstreaminginterfaceanditcanpickupthingssuchastokenscanorparser’sreductionasevents.ItisputintheattachedCD-ROM\footnote{ripper:archives/ripper-
0.0.5.tar.gzoftheattachedCD-ROM},soI’dlikeyoutogiveitatry.Notethatthesupportedgrammarisalittledifferentfromthecurrentonebecausethisversionisbasedonruby1.7almosthalf-yearago.
Icreatedthisjustbecause“Ihappenedtocomeupwiththisidea”,ifthisisaccounted,Ithinkitisconstructedwell.Ittookonlythree
daysorsotoimplement,reallyjustapieceofcake.
AparseralternativeThisproducthasnotyetappearedinaclearform,there’sapersonwhowriteaRubyparserinC++whichcanbeusedtotallyindependentofruby.([ruby-talk:50497]).
JRubyMoreaggressively,there’sanattempttorewriteentiretheinterpretor.Forexample,aRubywritteninJava,Ruby\footnote{JRubyhttp://jruby.sourceforge.net},hasappeared.Itseemsitisbeingimplementedbyalargegroupofpeople,Mr.JanArnePetersenandmanyothers.
Itrieditalittleandasmyreviews,
theparseriswrittenreallywell.Itdoespreciselyhandleevenfinerbehaviorssuchasspacesorheredocument.instance_evalseemsnotineffect(probablyitcouldn’tbehelped).ithasjustafewbuilt-inlibrariesyet(couldn’tbehelpedaswell).wecan’tuseextensionlibrarieswithit(naturally).becauseRuby’sUNIXcentricisallcutout,there’slittlepossibilitythatwecanrunalready-existingscriptswithoutanychange.slow
perhapsIcouldsayatleastthesethings.Regardingthelastone“slow”,itsdegreeis,theexecutiontimeittakesis20timeslongerthantheoneoftheoriginalruby.Goingthisfaristooslow.ItisnotexpectedrunningfastbecausethatRubyVMrunsonJavaVM.Waitingforthemachinetobecome20timesfasterseemsonlyway.
However,theoverallimpressionIgotwas,it’swaybetterthanIimagined.
NETRubyIfitcanrunwithJava,itshouldalsowithC#.Therefore,aRubywritteninC#appeared,“NETRuby\footnote{NETRubyhttp://sourceforge.jp/projects/netruby/}”.TheauthorisMr.arton.
BecauseIdon’thaveany.NETenvironmentathand,Icheckedonlythesourcecode,butaccordingtotheauthor,
morethananything,it’sslowithasafewclasslibrariesthecompatibilityofexceptionhandlingisnotgood
suchthingsaretheproblems.Butinstance_evalisineffect(astounding!).
Howtojoinrubydevelopmentruby’sdeveloperisreallyMr.Matsumotoasanindividual,
regardingthefinaldecisionaboutthedirectionrubywilltake,hehasthedefinitiveauthority.Butatthesametime,rubyisanopensourcesoftware,anyonecanjointhedevelopment.Joiningmeans,youcansuggestyouropinionsorsendpatches.Thebelowistoconcretelytellyouhowtojoin.
Inruby‘scase,themailinglistisatthecenterofthedevelopment,soit’sgoodtojointhemailinglist.Themailinglistscurrentlyatthecenterofthecommunityarethree:ruby-list,ruby-dev,ruby-talk.ruby-listisamailinglistfor“anythingrelatingtoRuby”inJapanese.ruby-devisforthedevelopmentversionruby,thisisalsoinJapanese.ruby-talkisanEnglishmailinglist.Thewaytojoinisshownonthepage“mailinglists”atRuby’sofficialsite\footnote{Ruby’sofficialsite:http://www.ruby-lang.org/ja/}.Forthesemailinglists,read-onlypeoplearealsowelcome,soIrecommendjustjoiningfirstandwatchingdiscussionstograsphowitis.
ThoughRuby’sactivitystartedinJapan,recentlysometimesitissaid“themainauthoritynowbelongstoruby-talk”.Butthecenterofthedevelopmentisstillruby-dev.Becausepeoplewhohasthecommitrighttoruby(e.g.coremembers)aremostlyJapanese,thedifficultyandreluctanceofusingEnglishnaturallyleadthemtoruby-dev.IftherewillbemorecorememberswhoprefertouseEnglish,thesituationcouldbechanged,butmeanwhilethecoreofruby’sdevelopmentmightremainruby-dev.
However,it’sbadifpeoplewhocannotspeakJapanesecannotjointhedevelopment,socurrentlythesummaryofruby-devistranslatedonceaweekandpostedtoruby-talk.Ialsohelpthatsummarising,butonlythreepeopledoitinturnnow,sothesituationisreallyharsh.Thememberstohelpsummarizeisalwaysindemand.Ifyouthinkyou’rethepersonwhocanhelp,I’dlikeyoutostateitatruby-list.
Andasthelastnote,onlyitssourcecodeisnotenoughforasoftware.It’snecessarytopreparevariousdocumentsandmaintainwebsites.Andpeoplewhotakecareofthesekindofthingsarealwaysinshort.There’salsoamailinglistforthedocument-relatedactivities,butasthefirststepyoujusthavetopropose“I’dliketodosomething”toruby-list.I’llansweritasmuchaspossible,andotherpeoplewouldrespondtoit,too.
FinaleThelongjourneyofthisbookisgoingtoendnow.Astherewasthelimitationofthenumberofpages,explainingallofthepartscomprehensivelywasimpossible,howeverItoldeverythingIcouldtellabouttheruby‘score.Iwon’taddextrathingsanymorehere.Ifyoustillhavethingsyoudidn’tunderstand,I’dlikeyoutoinvestigateitbyreadingthesourcecodebyyourselfasmuchasyouwant.
TheoriginalworkisCopyright©2002-2004MineroAOKI.
TranslatedbyVincentISAMBARTandCliffordEscobarCAOILEThisworkislicensedundera
CreativeCommonsAttribution-NonCommercial-ShareAlike2.5License