c--: a portable assembly language that supports …nr/pubs/c--gc.pdfc--: a portable assembly...

29
C--: a portable assembly language that supports garbage collection Simon Peyton Jones 1 , Norman Ramsey 2 , and Fermin Reig 3 1 [email protected], Microsoft Research Ltd 2 [email protected], University of Virginia 3 [email protected], University of Glasgow Abstract. For a compiler writer, generating good machine code for a variety of platforms is hard work. One might try to reuse a retargetable code generator, but code generators are complex and difficult to use, and they limit one’s choice of implementation language. One might try to use C as a portable assembly lan- guage, but C limits the compiler writer’s flexibility and the performance of the resulting code. The wide use of C, despite these drawbacks, argues for a portable assembly language. C-- is a new language designed expressly for this purpose. The use of a portable assembly language introduces new problems in the support of such high-level run-time services as garbage collection, exception handling, concurrency, profiling, and debugging. We address these problems by combining the C-- language with a C-- run-time interface. The combination is designed to allow the compiler writer a choice of source-language semantics and implemen- tation techniques, while still providing good performance. 1 Introduction Suppose you are writing a compiler for a high-level language. How are you to gen- erate high-quality machine code? You could do it yourself, or you could try to take advantage of the work of others by using an off-the-shelf code generator. Curiously, despite the huge amount of research in this area, only three retargetable, optimizing code generators appear to be freely available: VPO (Benitez and Davidson 1988), ML- RISC (George 1996), and the gcc back end (Stallman 1992). Each of these impressive systems has a rich, complex, and ill-documented interface. Of course, these interfaces are quite different from one another, so once you start to use one, you will be unable to switch easily to another. Furthermore, they are language-specific. To use ML-RISC you must write your front end in ML, to use the gcc back end you must write it in C, and so on. All of this is most unsatisfactory. It would be much better to have one portable assembly language that could be generated by a front end and implemented by any of the avail- able code generators. So pressing is this need that it has become common to use C as a portable assembly language (Atkinson et al. 1989; Bartlett 1989b; Peyton Jones 1992; Tarditi, Acharya, and Lee 1992; Henderson, Conway, and Somogyi 1995; Pettersson 1995; Serrano and Weis 1995). Unfortunately, C was never intended for this purpose — it is

Upload: phamliem

Post on 25-May-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

C--: a portable assemblylanguagethat supportsgarbagecollection

SimonPeytonJones1, NormanRamsey2, andFerminReig3

1 [email protected], MicrosoftResearchLtd2 [email protected], Universityof Virginia3 [email protected], Universityof Glasgow

Abstract. For a compilerwriter, generatinggoodmachinecodefor a varietyofplatformsis hardwork. Onemight try to reusea retargetablecodegenerator, butcodegeneratorsarecomplex anddifficult to use,andthey limit one’s choiceofimplementationlanguage.Onemight try to useC asa portableassemblylan-guage,but C limits the compilerwriter’s flexibility andthe performanceof theresultingcode.Thewideuseof C, despitethesedrawbacks,arguesfor aportableassemblylanguage.C-- is a new languagedesignedexpresslyfor this purpose.Theuseof aportableassemblylanguageintroducesnew problemsin thesupportof suchhigh-level run-timeservicesasgarbagecollection,exceptionhandling,concurrency, profiling, anddebugging.Weaddresstheseproblemsby combiningtheC-- languagewith aC-- run-timeinterface. Thecombinationis designedtoallow thecompilerwriter a choiceof source-languagesemanticsandimplemen-tationtechniques,while still providing goodperformance.

1 Intr oduction

Supposeyou arewriting a compiler for a high-level language.How areyou to gen-eratehigh-qualitymachinecode?You could do it yourself,or you could try to takeadvantageof the work of othersby usingan off-the-shelfcodegenerator. Curiously,despitethe hugeamountof researchin this area,only threeretargetable,optimizingcodegeneratorsappearto befreelyavailable:VPO(BenitezandDavidson1988), ML-RISC(George1996), andthegcc backend(Stallman1992). Eachof theseimpressivesystemshasa rich, complex, andill-documentedinterface.Of course,theseinterfacesarequitedifferentfrom oneanother, so onceyou startto useone,you will be unableto switcheasilyto another. Furthermore,they arelanguage-specific.To useML-RISCyou mustwrite your front endin ML, to usethegcc backendyou mustwrite it in C,andsoon.

All of thisis mostunsatisfactory. It wouldbemuchbetterto haveoneportableassemblylanguagethatcouldbegeneratedby a front endandimplementedby any of theavail-ablecodegenerators.Sopressingis this needthat it hasbecomecommonto useC asaportableassemblylanguage(Atkinsonetal. 1989;Bartlett1989b;PeytonJones1992;Tarditi,Acharya,andLee1992;Henderson,Conway, andSomogyi1995;Pettersson1995;SerranoandWeis1995).Unfortunately, C wasnever intendedfor this purpose— it is

a programminglanguage,not anassemblylanguage.C locks the implementationintoa particularcalling convention,makesit impossibleto computetargetsof jumps,pro-videsno supportfor garbagecollection,andprovidesvery little supportfor exceptionsor debugging(Section2).

Theobviousway forwardis to designa languagespecificallyasa compilertarget lan-guage.Suchalanguageshouldserveastheinterfacebetweenacompilerfor ahigh-levellanguage(thefrontend) andaretargetablecodegenerator(theback end). Thelanguagewould not only make the compilerwriter’s life mucheasier, but would alsogive theauthorof anew codegeneratoraready-madecustomerbase.In anearlierpaperwepro-posea designfor just sucha language,C-- (PeytonJones,Oliva,andNordin1998),but thestorydoesnotendthere.Separatingthefront andbackendsgreatlycomplicatesrun-timesupport.In general,the front end,backend,andrun-timesystemfor a pro-gramminglanguagearedesignedtogether. They cooperateintimately to supportsuchhigh-level featuresasgarbagecollection,exceptionhandling,debugging,profiling,andconcurrency — high-levelrun-timeservices. If thebackendis aportableassemblerlikeC--, wewantthecooperationwithout theintimacy; animplementationof C-- shouldbeindependentof thefront endswith which it will beused.

Onealternativeis to makeall thesehigh-level servicespartof theabstractionofferedbytheportableassembler. For example,theJavaVirtual Machine,whichprovidesgarbagecollectionandexceptionhandling,hasbeenusedasa target for languagesotherthanJava, including Ada (Taft 1996), ML (Benton,Kennedy, andRussell1998),Scheme(ClausenandDanvy 1998), andHaskell (Wakeling1998). But a sophisticatedplatformlike a virtual machineembodiestoo many designdecisions.For a start,thesemanticsof the virtual machinemay not matchthe semanticsof the languagebeingcompiled(e.g.,theexceptionsemantics).Evenif thesemanticshappento match,theengineeringtradeoffs may differ dramatically. For example,functional languageslike Haskell orSchemeallocatelikecrazy(Diwan,Tarditi, andMoss1993), andJVM implementationsare typically not optimisedfor this case.Finally, a virtual machinetypically comescompletewith a very largeinfrastructure— classloaders,verifiersandthelike — thatmaywell beinappropriate.Our intendedlevel of abstractionis much,muchlower.

Ourproblemis to enableaclient to implementhigh-levelservices,whilestill usingC--asacodegenerator. As wediscussin Section4, supportinghigh-level servicesrequiresknowledgefrom both the front andbackends.The insightbehindour solutionis thatC-- shouldincludenotonlyalow-levelassemblylanguage,for useby thecompiler,butalsoa low-level run-timesystem,for useby thefront end’s run-timesystem.Theonlyintimatecooperationrequiredis betweentheC-- backendandits run-timesystem;thefront endworkswith C-- atarm’s length,througha well-definedlanguageandawell-definedrun-time interface(Section5). This interfaceaddssomethingfundamentallynew: theability to inspectandmodify thestateof asuspendedcomputation.

It is notobviousthatthisapproachis workable.Canjustafew assembly-languagecapa-bilities supportmany high-level run-timeservices?Canthe front-endrun-timesystemeasilyimplementhigh-level servicesusingthesecapabilities?How muchis overalleffi-ciency compromisedby thearms-lengthrelationshipbetweenthefront-endruntimeandtheC-- runtime?We cannotyet answerthesequestionsdefinitively. Instead,thepri-

2

marycontributionsof thispaperareto identify needsthatarecommonto varioushigh-level services,andto proposespecificmechanismsto meettheseneeds.Wedemonstrateonly how to useC-- to implementtheeasiestof our intendedservices,namelygarbagecollection.Refiningourdesignto accommodateexceptions,concurrency, profiling,anddebugginghasemergedasaninterestingresearchchallenge.

2 It’ s impossible— or it’ s C

Thedreamof aportableassemblerhasbeenaroundatleastsinceUNCOL(Conway1958).Is it animpossibledream,then?Clearlynot:C’spopularityasanassembleris clearev-idencethataneedexists,andthatsomethingusefulcanbedone.

If C is sopopular, thenperhapsC is perfectlyadequate?Not so.Therearemany diffi-culties,of which themostfundamentalarethese:

– The C routerewardsthosewho canmaptheir high-level languageratherdirectlyonto C. A high-level languageprocedurebecomesa C procedure,andso on. Butthismappingis oftenawkward,andsometimesimpossible.

For example,somesourcelanguagesfundamentallyrequiretail-call optimisation;aprocedurecall whoseresultis returnedto thecallerof thecurrentproceduremustbeexecutedin thestackframeof thecurrentprocedure.This optimisationallowsiterationto be implementedefficiently usingrecursion.More generally, it allowsoneto think of a procedureasa labelledextendedbasicblock thatcanbe jumpedto, ratherthanassub-programthatcanonly becalled. Suchproceduresgiveafrontendthefreedomto designits own controlflow.

It is very difficult to implementthe tail-call optimisationin C, and no C com-piler known to us doesso acrossseparatelycompiledmodules.Thoseusing Chave beenvery ingeniousin finding ways aroundthis deficiency (Steele1978;Tarditi,Acharya,andLee1992;PeytonJones1992;Henderson,Conway, andSom-ogyi 1995),but theresultsarecomplex, fragile,andheavily tunedfor oneparticularimplementationof C (usuallygcc).

– A C compilermay lay out its stackframesas it pleases.This makes it difficultfor a garbagecollectorto find thelivepointers.Implementorseitherarrangenot tokeeppointerson the C stack,or they usea conservative garbagecollector. TheserestrictionsareDraconian.

– Theunknown stack-framelayoutalsocomplicatessupportfor exceptionhandling,debugging,profiling, andconcurrency. For example,anexception-handlingmech-anismneedsto walk the stack,perhapsremoving stackframesasit goes.Again,C makesit essentiallyimpossibleto implementsuchmechanisms,unlessthey canbecloselymappedontowhatC provides(i.e.,setjmp andlongjmp).

– A C compilerhasto beveryconservativeaboutthepossibilityof memoryaliasing.This seriouslylimits the ability of the instructionschedulerto permutememoryoperationsor hoist themout of a loop. The front-endcompileroften knows thataliasingcannotoccur, but thereis noway to convey this informationto theC com-piler.

3

Somuchfor fundamentalissues.C alsolackstheability to controla numberof impor-tant low-level features,including returningmultiple valuesin registersfrom a proce-dure,mis-alignedmemoryaccesses,arithmetic,datalayout,andomittingrangechecksonmulti-way jumps.

In short,C is awkward to useasa portableassembler, andmany of thesedifficultiestranslateinto performancehits. A portableassemblylanguageshouldbeableto offerbetterperformance,aswell asgreatereaseof use.

3 An overview of C--

/* Ordinary recursion */export sp1;sp1( bits32 n ) {bits32 s, p;if n == 1 {

return( 1, 1 );} else {

s, p = sp1( n-1 );return( s+n, p*n );

}}

/* Tail recursion */export sp2;sp2( bits32 n ) {jump sp2_help( n, 1, 1 );

}

sp2_help( bits32 n, bits32 s, bits32 p ) {if n==1 {

return( s, p );} else {

jump sp2_help( n-1, s+n, p*n )}

}

/* Loops */export sp3;sp3( bits32 n ) {

bits32 s, p;s = 1; p = 1;

loop:if n==1 {

return( s, p );} else {

s = s+n;p = p*n;n = n-1;goto loop;

}}

Fig.1. Threefunctionsthatcomputethesum∑�i � 1 i andproduct∏

�i � 1 i, written in C--.

In this sectionwe give an overview of the designof C--. Fuller descriptionscanbefoundin PeytonJones,Oliva,andNordin (1998)andin ReigandPeytonJones(1998).Figure1 givesexamplesof someC-- proceduresthatgive a flavour of the language.Despiteits nameC-- is by no meansa subsetof C, aswill becomeapparent;C wassimplyour jumping-off point.

4

3.1 What is a portable assember?

C-- is an assemblylanguage — an abstractionof hardware— not a high-level pro-gramminglanguage.Hardwareprovidescomputation,controlflow, memory, andregis-ters;C-- providescorrespondingabstractions.

– C-- expressionsandassignmentsareabstractionsof computation.C-- providesa rich setof computationaloperators,but theseoperatorswork only on machine-level datatypes:bytes,words,etc.Theexpressionabstractionhidestheparticularcombinationof machineinstructionsneededto computevalues,and it hidesthemachineregistersthatmaybeneededto hold intermediateresults.

– C--’sgoto andif statementsareabstractionsof controlflow. (For convenience,C-- alsoprovidesstructuredcontrol-flow constructs.)Theif abstractionhidesthemachine’s“conditioncodes;”branchconditionsarearbitraryBooleanexpressions.

– C-- treatsmemorymuchasthemachinedoes,exceptthataddressesusedin C--programsmaybearbitraryexpressions.Thisabstractionhidesthelimitationsof themachine’saddressingmodes.

– C-- variablesareanabstractionof registers.A C-- backendputsasmany vari-ablesaspossiblein registers;othersgo in memory. Thisabstractionhidesthenum-berandconventionalusesof themachine’sregisters.

– In addition,C-- providesa procedureabstraction,thefeaturethat looksleastlikeanabstractionof ahardwareprimitive.However, many processorarchitecturespro-videdirectsupportfor procedures,althoughthenatureof thatsupportvarieswidely(procedurecall or multiple registersave instructions,registerwindows, link regis-ters,branchpredictionfor returninstructions,andsoon). Becauseof this variety,calling conventionsandactivation-stackmanagementarenotoriouslyarchitecturedependentandhardto specify. C-- thereforeoffersprodeduresasa primitive ab-straction,albeitin a slightly unusualform (Section3.4).

Our goal is to make it easyto retarget front ends,not to make every C-- programrunnableeverywhere.AlthougheveryC-- programhasa well-definedsemanticsthatis independentof any machine,a front endtranslatinga singlesourceprogrammightneedto generatetwo differentC-- programsfor two differenttargetarchitectures.Forexample,a C-- programgeneratedfor a machinewithout floating-pointinstructionswouldbedifferentfrom aC-- programgeneratedfor amachinewithoutfloating-pointinstructions.

Our goal contrastssharplywith “write once;run anywhere,” the goal of suchdistri-bution formatsas Java classfiles, Juicefiles (Franz1997), and ANDF or TenDRA(Macrakis1993). Theseformatsareabstractionsof high-level languages,notof under-lying machines.Their purposeis binaryportability, andthey retainenoughhigh-levellanguagesemanticsto permiteffectivecompilationat theremoteinstallationsite.

EventhoughC-- exposesa few architecture-specificdetails,like wordsize,thewholepoint is to hidethosedetails,sothatthefront endjob canlargelyindependentof thetar-getarchitecture.A goodC-- implementationthereforemustdosubstantialarchitecture-dependentwork. For example:

5

– Registerallocation.

– Instructionselection,exploiting complex instructionsandaddressingmodes.

– Instructionscheduling.

– Stack-framelayout.

– Classicback-endoptimisationssuchas common-subexpressionelimination andcopy propagation.

– If-conversionfor predicatedarchitectures.

Given theserequirements,C-- resemblesa typical compiler’s intermediatelanguagemorethana typicalmachine’sassemblylanguage.

3.2 Types

C-- supportsabareminimumof datatypes:afamilyof bits types(bits8,bits16,bits32,bits64), andafamilyof floating-pointtypes(float32,float64,float80).Thesetypesencodeonly thesize(in bits) andthekind of register(general-purposeorfloating-point)requiredfor thedatum.

Not all typesareavailableon all machines;for example,aC-- programemittedby afront-endcompilerfor a 64-bitmachinemightberejectedif fed to aC-- implementa-tion for a32-bitmachine.It is easyto tell afront endhow big to makeits datatypes,anddoingso makesthe front end’s job easierin someways;for example,it cancomputeoffsetsstatically.

Thebits typesareusedfor characters,bit vectors,integers,and addresses(pointers).Oneacharchitecture,abits typeis designatedthe“nativeword type” of themachine.A “nativecode-pointertype” and“nativedata-pointertype” arealsodesignated;ex-portedandimportednamesmusthave oneof thesepointertypes.On many machines,all threetypesarethesame,e.g,bits32.

3.3 Static allocation

C-- offersdetailedcontrolof staticmemorylayout,muchasordinaryassemblersdo.Adatablock consistsof a sequenceof labels,initialiseddatavalues,uninitialisedarrays,andalignmentdirectives.For example:

data {foo: bits32{10}; /* One bits32 initialised to 10 */

bits32{1,2,3,4}; /* Four initialised bits32’s */bits32[8]; /* Uninitialised array of 8 bits32’s */

baz1:baz2: bits8 /* An uninitialised byte */end:

}

6

Herefoo is theaddressof thefirst bits32, baz1 andbaz2 areboththeaddressofthebits8, andend is theaddressof thebyteafterthebits8. Thelabelsfoo, baz1,etc,shouldbethoughtof asaddresses,notasmemorylocations.They areall immutableconstantsof thenativedata-pointertype;they cannotbeassignedto.

How, then,canoneaccessthememoryat locationfoo? Memoryaccesses(loadsandstores)aretyped,anddenotedwith squarebrackets.Thusthestatement:

bits32[foo] = bits32[foo] + 1;

loadsabits32 from thelocationwhoseaddressis in foo, addsoneto it, andstoresit at thesamelocation.Themnemonicfor thissyntaxis to think of bits32 asaC-likearrayrepresentingall of memory, andbits32[foo] asa particularelementof thatarray. Thesemanticsof theaddressis notC-like,however;theexpressionin bracketsisthebyteaddressof theitem.Further, foo’s typeis alwaysthenativedata-pointertype;thetypeof valuestoredatfoo is specifiedby theloador storeoperationitself. Sothisis perfectlylegal:

bits8[foo+2] = bits8[foo+2] - 1;

Thisstatementmodifiesonly thebyteataddressfoo+2.

Unlike C, C-- hasno implicit alignmentor padding.Therefore,theaddressrelation-shipsbetweenthedataitemswithin a singledata blockaremachine-independent;forexample,baz1 � foo

�52. An explicit align directive providesalignmentwhere

thatis required.

C-- supportsmultiple,nameddatasections.For example:

data "debug" {...

}

This syntax declaresthe block of data to belong to the sectionnamed"debug".Codeis by default placedin the section"text", anda data directive with no ex-plicit sectionnamedefaults to the section"data". Procedurescan be enclosedincode "mytext" { ... } to placethemin a namedsection"mytext".

C-- expectsthat,whenlinking objectfiles, the linker concatenatessectionswith thesamename.(For backwardscompatibilitywith someexisting linkers,front endsmaywish to emit an alignmentdirective at the beginning of eachC-- section.)C-- as-signsno othersemanticsto thenamesof datasections,but particularimplementationsmayassignmachine-dependentsemantics.For example,aMIPSimplementationmightassumethatdatain sectionsnamed"rodata" is read-only.

3.4 Procedures

C-- supportsproceduresthatarebothmoreandlessgeneralthanC procedures— forexample,C-- proceduresoffer multiple resultsandfull tail calls,but they havea fixednumberof arguments.Specifically:

7

– A C-- procedure,suchassp1 in Figure1, hasparameters, suchasn, and localvariables, suchass andp. Parametersandvariablesaremappedonto machineregisterswherepossible,andonly spilledto thestackwhennecessary. In thisabso-lutely conventionalwayC-- abstractsaway from thenumberof machineregistersactuallyavailable.As with registers,C-- providesno way to take theaddressof aparameteror localvariable.

– C-- supportsfully generaltail calls,identifiedas“jumps”. Controldoesnotreturnfrom jumps,andC-- implementationsmust deallocatethe caller’s stackframebeforeeachjump.For example,theproceduresp2_help in Figure1 usesa jumpto implementtail recursion.

– C-- supportsprocedureswith multiple results,just asit supportsprocedureswithmultiplearguments.Indeed,areturnis somewhatlikea jumpto aprocedurewhoseaddresshappensto be held in the topmostactivation recordon the control stack,ratherthan being specifiedexplicitly. All the proceduresin Figure 1 return tworesults;proceduresp1 containsa call sitefor sucha procedure.

– A C-- procedurecall is alwaysa completestatement,which passesexpressionsasparametersandassignsresultsto localvariables.Althoughhigh-level languagesallow a call to occurin anexpression,C-- forbids it. For example,it is illegal towrite

r = f( g(x) ); /* illegal */

becausethe result returnedby g(x) cannot be an argument to f. Instead,onemustwrite two separatecalls:

y = g(x);r = f(y);

Thisrestrictionmakesexplicit theorderof evaluation,thelocationof eachcall site,andthenamesandtypesof temporariesusedtoholdtheresultsof calls.(Forsimilarreasons,assignmentsin C-- arestatements,not expressions,andC-- operatorshavenosideeffects.In particular, C-- providesnoanalogof C’s “p++.”)

– To handlehigh-levelvariablesthatcan’t berepresentedusingC--’sprimitivetypes,C-- canbeaskedto allocatenamedareasin theprocedure’sactivationrecord.

f (bits32 x) {bits32 y;

stack { p : bits32;q : bits32[40];

}/* Here, p and q are the addresses of the relevant chunks

of data. Their type is the native data-pointer type. */}

8

stack is ratherlikedata; it hasthesamesyntaxbetweenthebraces,but it allo-cateson thestack.As with data, thenamesareboundto theaddressesof therele-vant locations,andthey areimmutable.C-- makesno provision for dynamically-sizedstackallocation(yet).

– The nameof a procedureis a C-- expressionof native code-pointertype. Theprocedurespecifiedin a call statementcanbeanarbitraryexpression,not simplythe statically-visiblenameof a procedure.For example,the following statementsarebothvalid, assumingtheproceduresp1 is definedin this compilationunit, orimportedfrom anotherone:

bits32[ptr] = sp1; /* Store procedure address */...r,s = (bits32[ptr])( 4 ); /* Call stored procedure */

– A C-- procedure,like sp3 in Figure1, maycontaingotos andlabels,but theyserveonly to allow a textual representationof thecontrol-flow graph. Unlike pro-cedurenames,labelsarenot values,andthey have no representationat run time.Becausethisrestrictionmakesit impossiblefor front endstobuild jumptablesfromlabels,C-- includesa switchstatement,for which theC-- backendgeneratesef-ficient code.Themostefficient mix of conditionalbranchesandindexedbranchesmaydependon thearchitecture(Bernstein1985).

Jumptablesof procedureaddresses(ratherthanlabels)canbebuilt, of course,anda C-- procedurecanusethe jump statementto make a tail call to a computedaddress.

3.5 Calling conventions

Thecalling conventionfor C-- proceduresis entirelya matterfor theC-- implemen-tation — we call it the standard C-- calling convention. In particular, C-- neednotusetheC callingconvention.

The standardcalling convention placesno restrictionson the numberof argumentspassedto afunctionor thenumberof resultsreturnedfrom afunction.Theonly restric-tionsarethatthenumberandtypesof actualparametersmustmatchthosein theproce-duredeclaration,andsimilarly, thatthenumberandtypesof valuesreturnedmustmatchthoseexpectedat thecall site.Theserestrictionsenableefficientcallingsequenceswithno dynamicchecks.(A C-- implementationneednot checkthatC-- programsmeettheserestrictions.)

We notethefollowing additionalpoints:

– If aC-- functiondoesnot“escape”— if all siteswhereit is calledcanbeidentifiedstatically— thentheC-- backendis freeto createandusespecialisedinstances,with specialisedcallingconventions,for eachcall site.Escapeanalysisis necessar-ily conservative,but a functionmaybedeemedto escapeonly if its nameis usedotherthanin acall, or if it is namedin anexport directive.

9

– Supportfor unrestrictedtail calls requiresanunusualcalling convention,sothataproceduremakinga tail call candeallocateits activationrecordwhile still leavingroomfor parametersthatdonotfit in registers.

– C-- allowstheprogrammerto specifyaparticularcallingconvention(chosenfroma smallsetof standardconventions)for anindividualprocedure,sothatC-- codecaninteroperatewith foreigncode.For example,eventhoughC--’s standardcall-ing conventionmaydiffer from C’s, onecanaskfor a particularprocedureto useC’s convention,so that the procedurecanbe calledfrom an externalC program.Similarly, externalC procedurescanbecalledfrom aC-- procedureby specifyingthecallingconventionat thecall site.

SomeC-- implementationsmay provide two versionsof C’s calling convention.The lightweight versionwould be like an ordinaryC call, but it would be usefulonly whenthe C proceduresterminatequickly; if control weretransferredto therun-timesystemwhile a C procedurewasactive, therun-timesystemmightnot beable to find valuesthat werein callee-savesregistersat the time of the call. Theheavyweightversionwould keepall its stateon thestack,not in callee-savesreg-isters,sotherun-timesystemcouldhandlea stackcontaininga mix of C andC--activations.

3.6 Miscellaneous

Like otherassemblers,C-- givesprogrammerstheability to namecompile-timecon-stants,e.g.,by

const GC = 2;

C-- variablesmaybedeclaredglobal, in which casetheC-- compilerattemptstoput themin registers.For example,giventhedeclaration

global {bits32 hp;

}

the implementationattemptsto put variablehp in a register, but if no registeris avail-able, it putshp in memory. C-- programsuseand assignto hp without knowingwhetherit is in a registeror in memory. Unlike storageallocatedby data, thereis nosuchthingas“the addressof aglobal”, somemorystoresto unknown addressescannotaffect thevalueof a global.This permitsa globalto beheldin a registerand,evenif ithasto beheldin memory, theoptimiserdoesnotneedto worry aboutre-loadingit aftera storeto an unknown memoryaddress.All separatelycompiledmodulesmusthaveidenticalglobal declarations,or horribly strangethingswill happen.

global declarationsmaynamespecific(implementation-dependent)registers,for ex-ample:

global {bits32 hp "%ebx";bits32 hplim "%esi";

}

10

We remarked in Section2 that the front endmay know a greatdealabout(lack of)aliasingbetweenmemoryaccessoperations.We do notyet havea way to expresssuchknowledgein C--, but an adaptationof Novack,Hummel,andNicolau (1995)lookspromising.

4 The problemof run-time support

Whenafrontendandbackendarewrittentogether,aspartof asinglecompiler, they cancooperateintimatelyto supporthigh-level run-timeservices,suchasgarbagecollection,exceptionhandling,profiling, concurrency, anddebugging.In theC-- framework, thefront andbackendswork at arm’s length.As mentionedearlier, our guidingprincipleis this:

C-- shouldmake it possibleto implementhigh-level run-timeservices,butit shouldnot actually implementany of them.Rather, it shouldprovide justenough“hooks” to allow thefront-endrun-timesystemto implementthem.

Separatingpolicy from mechanismin this way is easiersaidthandone.It mightappearmorepalatableto incorporategarbagecollection,exceptionhandling,anddebugginginto the C-- language,as (say) the Java Virtual Machinedoes.But doing so wouldguaranteethatC-- would never beused.Differentsourcelanguagesrequiredifferentsupport,differentobjectlayouts,anddifferentexceptionsemantics— especiallywhenperformancematters.No onebackendcouldsatisfyall customers.

Why is theseparationbetweenfront andbackendhardto achieve?High-levelrun-timeservicesneedto inspectandmodifythestateof a suspendedprogram. A garbagecollec-tor mustfind,andperhapsmodify, all livepointers.An exceptionhandlermustnavigate,andperhapsunwind,thecall stack.A profilermustcorrelateobject-codelocationswithsource-codelocations,andpossiblynavigatethecall stack.A debuggermustallow theuserto inspect,andperhapsmodify, thevaluesof variables.All of thesetasksrequireinformationfrom bothfront andbackends.Therestof thissectionelaborates.

Finding rootsfor garbagecollection. If thehigh-level languagerequiresaccurategar-bagecollection,thenthe garbagecollectormustbe able to find all the roots thatpoint into the heap.If, furthermore,the collectorsupportscompaction,the loca-tionsof heapobjectsmaychangeduringgarbagecollection,andthecollectormustbeableto redirecteachroot to point to thenew locationof thecorrespondingheapobject.

Thedifficulty is thatneitherthe front endnor thebackendhasall theknowledgeneededto find rootsat run time.Only thefront endknowswhich source-languagevariables,andthereforewhichC-- variables,representpointersinto theheap.Onlythebackend,whichmapsvariablesto registersandstackslots,knowswherethosevariablesare locatedat run time. Even the backendcan’t always identify exactlocations;variablesmappedto callee-savesregistersmay be saved arbitrarily faraway in thecall stack,at locationsnot identifiableuntil run time.

11

Printing valuesin a debugger. A debuggerneedscompilersupportto print thevaluesof variables.For this task, information is divided in much the sameway as forgarbagecollection.Only the front endknows how source-languagevariablesaremappedonto(collectionsof) C-- variables.Only thefront endknowshow to printthe valueof a variable,e.g.,asdeterminedby the variable’s high-level-languagetype.Only thebackendknowswhereto find thevaluesof theC-- variables.

Loci of control A debuggermust be able to identify the “locus of control” in eachactivation,andto associatethatlocuswith asource-codelocation.Thisassociationis usedboth to plant breakpointsand to report the source-codelocation whenaprogramfaults.

An exceptionmechanismalsoneedsto identify the locusof control, becauseinsomehigh-level languages,thatlocusdetermineswhichhandlershouldreceivetheexception.Whenit identifiesahandler, theexceptionmechanismunwindsthestackandchangesthelocusof controlto referto thehandler.

A profilermustmaploci of controlinto entitiesthatareprofiled:procedures,state-ments,source-coderegions,etc.

At run time, loci of controlarerepresentedby valuesof theprogramcounter(e.g.,returnaddresses),but at thesourcelevel, loci of controlareassociatedwith state-mentsin a high-level languageor in C--. Only the front endknows how to as-sociatehigh-level sourcelocationsor exception-handlerscopeswith C-- state-ments.Only thebackendknowshow to associateC-- statementswith theprogramcounter.

Li veness.Dependingon the semanticsof the original sourcelanguage,the locusofcontrolmaydeterminewhich variablesof thehigh-level languagearevisible.De-pendingon theoptimizationsperformedby thebackend,thelocusof controlmaydeterminewhich C-- variablesare live, and thereforehave values.Debuggersshouldnot print deadvariables.Garbagecollectorsshouldnot tracethem; trac-ing deadpointerscouldcausespaceleaks.Worse,tracinga registerthatoncehelda root but now holdsa non-pointervaluecould violate the collector’s invariants.Again, only the front endknows which variablesareinterestingfor debuggingorgarbagecollection,but only thebackendknows which arelive at a givenlocusofcontrol.

Exceptionvalues. In additionto unwindingthestackandchangingthe locusof con-trol, theexceptionmechanismmayhave to communicatea valueto anexceptionhandler. Only the front endknows which variableshouldreceive this value,butonly thebackendknowswherevariablesarelocated.

Succinctlystated,eachof theseoperationsmustcombinetwo kindsof information:

– Informationthatonly thefront endhas:� WhichC-- parametersandlocalvariablesareheappointers.� How to mapsource-languagevariablesto C-- variablesandhow to associate

source-codelocationswith C-- statements.

12

� Which exceptionhandlersare in scopeat which C-- statements,andwhichvariablesarevisibleatwhichC-- statements.

– Informationthatonly theback endhas:� WhethereachC-- local variableandparameteris live,whereit is located(if

live),andhow this informationchangesastheprogramcounterchanges.� Whichprogram-countervaluescorrespondto whichC-- statements.� How to find activationsof all activeproceduresandhow to unwindstacks.

5 Support for high-level run-time services

Themainchallenge,then,is arrangingfor thebackendandfront endto shareinforma-tion, without having to implementthemasa singleintegratedunit. In this sectionwedescribeaframework thatallowsthis to bedone.Wefocusongarbagecollectionasourillustrative example.Otherhigh-level run-timeservicescanfit in thesameframework,but eachrequiresservice-specificextensions;wesketchsomeideasin Section7.

In whatfollows,weusetheterm“variable”to meaneitheraparameterof theprocedureor a locally-declaredvariable.

5.1 The framework

Weassumethatexecutableprogramsaredividedinto threeparts,eachof whichmaybefoundin objectfiles, libraries,or acombination.

– Thefront endcompilertranslatesthehigh-level sourceprograminto oneor moreC-- modules,whichareseparatelytranslatedto generatedobjectcodeby theC--compiler.

– Thefront endcomeswith a (probablylarge) front-endrun-timesystem. This run-time systemincludesthe garbagecollector, exceptionhandler, andwhatever elsethe sourcelanguageneeds.It is written in a programminglanguagedesignedforhumans,not in C--; in whatfollowsweassumethatthefront endrun-timesystemis written in C.

– EveryC-- implementationcomeswith a (hopefullysmall)C-- run-timesystem.Themaingoalof thisrun-timesystemis to maintainandprovideaccessto informa-tion thatonly thebackendcanknow. It makesthisinformationavailableto thefrontendrun-timesystemthrougha C-languagerun-timeinterface,which we describein Section5.2.Differentfront endsmay interoperatewith thesameC-- run-timesystem.

To make an executableprogram,we link generatedobject codewith both run-timesystems.

13

In outline,C-- cansupporthigh-level run-timeservices,suchasgarbagecollection,asfollows.Whengarbagecollectionis required,control is transferredto thefront-endrun-time system(Section6.1). The garbagecollector then walks the C-- stack,bycalling accessroutinesprovided by the C-- run-time system(Section5.2). In eachactivation recordon the C-- stack,the garbagecollector finds the location of eachlivevariable,usingfurtherproceduresprovidedby theC-- runtime.However, theC--runtimecannotknow whichof thesevariablesholdsapointer. To answerthisquestion,the front-endcompiler builds a statically-allocateddatablock that identifiespointervariables,andit usesa span directive (Section5.3) to associatethis datablock withthecorrespondingprocedure’srangeof programcountervalues.Thegarbagecollectorcombinesthesetwo sourcesof informationto decidewhetherto treatthe procedure’svariableasa root.Section5.4describesonepossiblegarbagecollectorin moredetail.

5.2 The C-- run-time interface

This sectionpresentsthe core run-time interfaceprovided by the C-- run-timesys-tem.Usingthis interface,a front-endrun-timesystemcaninspectandmodify thestateof a suspendedC-- computation.Ratherthanspecifyrepresentationsof a suspendedcomputationor its activationrecords,we hidethembehindsimpleabstractions.Theseabstractionsarepresentedto the front-endrun-timesystemthrougha setof C proce-dures.

The stateof a C-- computationconsistsof somesaved registersanda logical stackof procedureactivations.This logical stackis usually implementedas somesort ofphysicalstack,but the correspondencebetweenthe two may not be very direct. No-tably, callee-savesregistersthatlogically belongwith oneactivationarenotnecessarilystoredwith thatactivation,or evenwith theadjacentactivation; they maybestoredinthephysicalrecordof anactivationthatis arbitrarily far away. Thisproblemis therea-sonthat C’s setjmp andlongjmp functionsdon’t necessarilyrestorecallee-savesregisters,which is why someC compilersmakepessimisticassumptionswhencompil-ing procedurescontainingsetjmp (HarbisonandSteele1995, � 19.4).

We hidethiscomplexity behinda simpleabstraction,theactivation. Theideaof anac-tivation of procedureP is that it approximatesthe statethe machine will be in whencontrol returnsto P. Theapproximationis not completelyaccuratebecauseotherpro-ceduresmaychangetheglobalstoreor P’sstackvariablesbeforecontrolreturnsto P. Atthemachinelevel, theactivationcorrespondsto the“abstractmemory”of Ramsey (1992),Chapter3, which givesthecontentsof memory, includingP’s activationrecord(stackframe),andof registers.

Theactivationabstractionhidesmachine-dependentdetailsandraisesthe level of ab-stractionto theC-- source-codelevel. In particular, theabstractionhides:

– Thelayoutof anactivationrecord,andtheencodingusedto recordthat layoutforthebenefitof thefront endruntime,

– Thedetailsof manipulatingcallee-savesregisters(whetherto usecalleesavesreg-istersis entirelyup to theC-- implementation),and

14

– Thedirectionin which thestackgrows.

All of thesemattersbecomeprivateto thebackendandtheC-- runtime.

In theC-- run-timeinterface,anactivationrecordis representedby anactivationhan-dle, which is avalueof typeactivation. Arbitrary registersandmemoryaddressesarerepresentedby variables,whicharereferredto by number.

Theproceduresin theC-- run-timeinterfaceinclude:

void *FindVar( activation *a, int var_index ) asksan activationhandlefor the locationof any parameteror local variablein the activationrecordto which the handlerefers.The variablesof a procedureareindexedby number-ing themin the order in which they aredeclaredin that procedure,startingwithzero.FindVar returnsthe addressof the location containingthe value of thespecifiedvariable.The front endis therebyableto examineor modify the value.FindVar returnsNULL if the variableis dead.It is a checked runtimeerror topassavar_index thatis outof range.

void FirstActivation( tcb *t, activation *a ). WhenexecutionofaC-- programis suspended,itsstateiscapturedby theC-- run-timesystem.FirstActivationusesthat stateto initialise an activationhandlethat correspondsto the procedurethatwill executewhentheprogram’sexecutionis resumed.

int NextActivation( activation *a ) modifiesthe activation handleato referto theactivationrecordof a’s caller, or moreprecisely, to theactivationtowhich controlwill returnwhena returns.NextActivation returnsnonzeroifthereissuchanactivationrecord,andzeroif thereisnot.Thatis,NextActivation(&a)returnszeroif andonly if activationhandlea refersto thebottom-mostrecordontheC-- stack.

NoticethatFindVar alwaysreturnsa pointerto a memorylocation,eventhoughthespecifiedvariablemightbeheldin a registerat themomentatwhichgarbagecollectionis required.But by thetime thegarbagecollectoris walking thestack,theC-- imple-mentationmusthavestoredall theregistersawayin memorysomewhere,andit is uptotheC-- run-timesystemto figureout wherethevariableis, andto returntheaddressof thelocationholdingit.

Namesboundbystack declarationsareconsideredvariablesfor purposesof FindVar,eventhoughthey areimmutable.For suchnames,FindVar returnsthevaluethatthenamehasin C-- sourcecode,i.e., the addressof the stack-allocatedblock of stor-age.Storingthroughthis addressis meaningful;it altersthecontentsof theactivationrecorda. Stacklocationsarenotsubjectto livenessanalysis.

5.3 Front-endinformation

Supposethe garbagecollector is examininga particularactivation record.It canuseFindVar to locatevariablenumber1, but how canit know whetherthat variableis

15

a pointer?The front end compiler cooperateswith C-- to answerthis question,asfollows:

– Thefront endbuilds a staticinitialiseddatablock (Section3.3),or descriptor, thatsayswhich of theparametersandlocal variablesof a procedureareheappointers.Theformatof thisdatablock is known only to thefront-endcompilerandrun-timesystem;theC-- run-timesystemdoesnotcare.

– The front endtells C-- to associatea particularrangeof programcounterswiththisdescriptor, usingaspan directive.

– TheC-- run-timesystemprovidesa call, GetDescriptor, that mapsanacti-vation handleto the descriptorassociatedwith the programcounterat which theactivationis suspended.

We discusseachof thesestepsin moredetail.As anexample,supposewehavea func-tion f(x, y), with no othervariables,in whichx holdsa pointerinto theheapandy holdsaninteger. Thefront endcanencodetheheap-pointerinformationby emittinga datablock,or descriptor, associating1 with x and0 with y:

data {gc1: bits32 2; /* this procedure has two variables */

bits8 1; /* x is a pointer */bits8 0; /* y is a non-pointer */

}

Thisencodingdoesnotusethenamesof thevariables;instead,eachvariableis assignedan integer index, basedon the textual orderin which it appearsin thedefinitionof f.Thereforex hasindex 0 andy hasindex 1.

Many otherencodingsarepossible.Thefront endmight emit a tablethatusesonebitper variable,insteadof onebyte. It might emit a list of the indicesof variablesthatcontainpointers.It might arrangefor pointervariablesto have continuousindicesandemit only the first and last suchindex.1 The key propertyof our designis that theencodingmatters only to the front endand its runtimesystem. C-- doesnot know orcareabouttheencoding.

To associatethegarbage-collectiondescriptorwith f, thefront endplacesthedefinitionof f in aC-- span:

span GC gc1 {f( bits32 x, bits32 y ) {

...code for f...}

}

A spanmayapplyto asequenceof functiondefinitions,or to asequenceof statementswithin a function definition. In this case,thespan appliesto all of f. Theremay be

1 This scenariopresumesthefront endhastheprivilegeof reorderingparameters;otherwise,itwouldhave to usesomeotherschemefor parameters.

16

several independentspanmappingsin usesimultaneously, e.g.,onefor garbagecol-lection,onefor exceptions,onefor debugging,andso on.C-- usesinteger tokenstodistinguishthesemappingsfrom oneanother;GC is the token in the exampleabove.C-- takesno interestin thetokens;it simplyprovidesamapfrom a(token,PC)pair toanaddress.Tokenvaluesareusuallydefinedusingaconst declaration(Section3.6).

Whenthegarbagecollector(say)walks thestack,usinganactivationhandlea, it cancall thefollowingC-- run-timeprocedure:

void *GetDescriptor( activation *a, int token ) returnsthead-dressof thedescriptorassociatedwith thesmallestC-- spantaggedwith tokenandcontainingtheprogrampointwheretheactivationa is suspended.

Therearenoconstraintson theform of thedescriptorthatgc1 labels;thatform is pri-vateto thefront endandits run-timesystem.All C-- doesis transformspan directivesinto mappingsfrom programcountersto values.

Thefront endmayemitdescriptorsandspansto supportotherservices,not justgarbagecollection.For example,to supportexceptionhandlingor debugging,thefront endmayrecordthescopesof exceptionhandlersor thenamesandtypesof variables.C-- sup-portsmultiplespans,but they mustnotoverlap.Spanscannest,however;theinnermostspanbearingagiventokentakesprecedence.Onecanachievetheeffectof overlappingby bindingthesamedatablock to multiplespans.

5.4 Garbagecollection

This sectionexplainsin moredetailhow theC-- run-timeinterfacemight beusedtohelp implementa garbagecollector. Our primary concernis how the collectorfinds,andpossiblyupdates,roots.Othertasks,suchasfinding pointersin heapobjectsandcompactingthe heap,canbe managedentirely by the front-endrun-timesystem(al-locatorandcollector)with no supportfrom thebackend.C-- takesno responsibilityfor heappointerspassedto codewritten in otherlanguages.It is up to thefront endtopin suchpointersor to negotiatechangingthemwith the foreigncode.We deferuntilSection6.1 the questionof how control is transferredfrom runningC-- codeto thegarbagecollector.

To help the collectorfind roots in global variables,the front end canarrangeto de-posit the addressesof suchvariablesin a specialdatasection.To find roots in localvariables,thecollectormustwalk theactivationstack.For eachactivationhandlea, itcallsGetDescriptor(&a, GC) to getthegarbage-collectiondescriptordepositedby the front end.Thedescriptortells it how many variablesthereareandwhich con-tain pointers.For eachpointervariable,it getsthe addressof that variableby callingFindVar. If the resultis NULL, the variableis dead,andneednot be traced.Other-wisethecollectormarksor movestheobjectthevariablepointsto, andit mayredirectthevariableto point to theobject’snew location.Notethatthecollectorneednotknowwhichvariableswerestoredon thestackandwhichwerekeptin callee-savesregisters;

17

FindVar providesthelocationof thevariablenomatterwhereit is. Figure2 showsasimplecopying collectorbasedonAppel (1989), targetedto theC-- run-timeinterfaceandthedescriptorsshown in Section5.3.

struct gc_descriptor {unsigned var_count;char heap_ptr[1];

};

void gc(void) {activation a;

FirstActivation(tcb, &a);for (;;) {struct gc_descriptor *d = GetDescriptor(&a, GC);if (d != NULL) {

int i;for (i = 0; i < d->var_count; i++)if (d->heap_ptr[i]) {typedef void *pointer;pointer *rootp = FindVar(a, i);if (rootp != NULL) *rootp = gc_forward(*rootp);

/* copying forward, as in Appel, if live */}

}if (NextActivation(&a) == NULL)

break;}gc_copy(); /* from-space to to-space, as in Appel */

}

Fig.2. Partof asimplecopying garbagecollector

A morecomplicatedcollectormight have to do morework to decidewhich variablesrepresentheappointers.TIL is themostcomplicatedexampleweknow of (Tarditi etal. 1996).In TIL, whetheraparameteris apointermaydependonthevalueof anotherparameter.For example,aC-- proceduregeneratedby TIL might look like this:

f( bits32 ty, bits32 a, bits32 b ) { ... }

Thefirst parameter, ty, is a pointerto a heap-allocatedtyperecord.It is not staticallyknown, however, whethera is a heappointer. At run time, the first field of the typerecordthatty pointsto describeswhethera is a pointer. Similarly, thesecondfield ofthetyperecorddescribeswhetherb is a pointer.

To supportgarbagecollection,weattachtof’sbodyaspanthatpointsto astaticallyal-locateddescriptor, whichencodespreciselytheinformationin theprecedingparagraph.How thisencodingis doneis aprivatematterbetweenthefront endandthegarbagecol-

18

lector;eventhis rathercomplicatedsituationis easilyhandledwith no furthersupportfrom C--.

5.5 Implementing the C-- run-time interface

CanspansandtheC-- run-timeinterfacebeimplementedefficiently?By sketchingapossibleimplementation,wearguethatthey can.Becausetheimplementationis privateto the backendandthe back-endrun-timesystem,thereis wide latitude for experi-mentation.Any techniqueis acceptableprovidedit implementsthesemanticsaboveatreasonablecost.We arguebelow thatwell-understoodtechniquesdo just that.

Implementing spans The spanmappingsof Section5.3 take their inspirationfromtablemappingsfor exceptionhandling,andthekey procedure,GetDescriptor, canbeimplementedin similar ways(Chase1994a). Themainchallengeis to build a map-ping from object-codelocations(possiblevaluesof the programcounter)to source-codelocationranges(spans).Themostcommonway is to usetablessortedby programcounter. If suitablelinker supportis available,tablesfor differenttokenscango in dif-ferentsections,and they will automaticallybe concatenatedat link time. Otherwise,tablescanbe chainedtogether(or consolidated)by an initialisation procedurecalledwhentheprogramstarts.

Implementing stackwalking In oursketchimplementation,thecall stackis acontigu-ousstackof activation records.An activationhandleis a staticrecordconsistingof apointerto anactivationrecordon thestack,togetherwith pointersto thelocationscon-tainingthevaluesthat thenon-volatile registers2 hadat themomentwhencontrol lefttheactivationrecord(Figure3).FirstActivation initialisestheactivationhandleto point to the topmostactivation recordon the stackandto the private locationsintheC-- runtimethathold the valuesof registers.Dependingon themechanismusedto suspendexecution,the runtimemight have valuesof all registersor only of non-volatile registers,but thisdetailis hiddenbehindtherun-timeinterface.Ramsey (1992)discussesretargetablestackwalking in Chapters3 and8.

The run-timesystemexecutesonly whenexecutionof C-- proceduresis suspended.We assumethatC-- executionis suspendedonly at a “safepoint.” Broadlyspeaking,a safepoint is a point at which the C-- run-timesystemis guaranteedto work; wediscussthedetailsin Section6.2.For eachsafepoint, theC-- codegeneratorbuilds astatically-allocatedactivation-record descriptorthatgives:

– Thesizeof theactivationrecord;NextActivation canusethis to move to thenext activationrecord.

2 Thenon-volatile registersarethoseregisterswhosevaluesareunchangedafter returnfrom aprocedurecall. They includenot only theclassiccallee-savesregisters,but alsoregisterslikethe framepointer, which mustbe saved andrestoredbut which aren’t always thoughtof ascallee-savesregisters.

19

Pointer to secondcallee-save register

Pointer to anactivation record

Activation handle

Top of stack

Bottom of stack

Pointer to firstcallee-save register

Activationrecord

Theactivationhandlepointsto anactivationrecord,whichmaycon-tain valuesof somelocal variables.Other local variablesmay bestoredin callee-saves registers,in which casetheir valuesarenotsaved in thecurrentactivation record,but in theactivation recordsof oneor morecalledprocedures.Theseactivationrecordscan’t bedetermineduntil run time,sothestackwalker incrementallybuildsa mapof the locationsof callee-save registers,by noting thesavedlocationsof eachprocedure.

Fig.3. Walkingastack

– The livenessof eachlocal variable,and the locationsof live variables,indexedby variablenumber. The “location” of a live variablemight be an offset withinthe activation record,or it might be the nameof a callee-savesregister. GetVarusesthis “location” to find theaddressof thetruememorylocationcontainingthevariable’svalue,eitherby computinganaddresswithin theactivationrecorditself,or by returningthe addressof the location holding the appropriatecallee-savesregister, asrecordedin theactivationhandle(Figure3).

– If thesafepointis acall site,thelocationswherethecalleeis expectedto putresultsreturnedfrom thecall.

– Thelocationswherethecaller’s callee-savesregistersmaybefound.Again, thesemay be locationswithin the activation record,or they may be this activation’scallee-savesregisters.NextActivationusesthisinformationtoupdatethepointers-to-callee-saves-registersin theactivationhandle.

20

TheC-- runtimecanmapactivationsto descriptorsusingthesamemechanismit usesto implementthespanmappingsof Section5.3.Therun-timeinterfacecancachethesedescriptorsin theactivationhandle,sothelookupneedbedoneonlywhenNextActivationis called,i.e.,whenwalkingthestack.An alternativethatavoidsthelookupis to storeapointerto thedescriptorin thecodespace,immediatelyfollowing thecall, andfor thecall to returnto theinstructionafterthepointer. TheSPARC C callingconventionusesa similar trick for functionsreturningstructures(SPARC 1992, AppendixD).

The detailsof descriptorsandmappingof activationsto descriptorsareimportantforperformance.At issueis the spaceoverheadof storingdescriptorsandmaps,andthetimeoverheadof findingdescriptorsthatcorrespondto PCs.Liskov andSnyder(1979)suggeststhat sharingdescriptorsbetweendifferentcall siteshasa significantimpacton performance.Becausethesedetailsareprivatebetweenthebackendandtheback-endrun-timesystem,wecanexperimentwith differenttechniqueswithoutchangingtheapproach,therun-timeinterface,or thefront end.

6 Refining the design

Thebasicideaof providingarun-timeinterfacethatallowsthestateof asuspendedC--computationto beinspectedandmodifiedseemsquiteflexible androbust.But workingout thedetailedapplicationof this ideato avarietyof run-timeservices,andspecifyingpreciselywhat thesemanticsof theresultinglanguageis, remainschallenging.In thissectionweelaboratesomeof thedetailsthatwerenotcoveredin theprecedingsection,anddiscussmechanismsthat supportrun-timeservicesotherthangarbagecollection.Ourdesignis notfinalised,sothissectionis somewhatspeculative.

6.1 Suspensionand intr ospection

All our intendedhigh-level run-timeservicesmustbeableto suspendaC-- computa-tion, inspectits state,andmodify it, beforeresumingexecution.

In many implementationsof high-level languages,therun-timesystemrunsonthesamephysicalstackas the programitself. In suchimplementations,walking the stackorunwindingthestackrequiresa thoroughunderstandingof systemcalling conventions,especiallyif an interrupt can causea transferof control from generatedcodeto therun-timesystem.We prefernot to exposethis implementationtechniquethroughtheC-- run-timeinterface,but to take a moreabstractview. TheC-- runtimethereforeoperatesas if the generatedcodeandthe run-timesystemrun on separatestacks,asseparatethreads:

– Thesystemthreadrunson thesystemstack suppliedby theoperatingsystem.Thefront-endrun-timesystemrunsin thesystemthread,andit caneasilyinspectandmodify thestateof theC-- thread.

21

– TheC-- threadrunson a separateC-- stack. Whenexecutionof theC-- threadis suspended,thestateof theC-- threadis savedin theC-- thread-control block,or TCB.

Wehaveto sayhow aC-- threadis created,andhow controlis transferredbetweenthesystemthreadandaC-- thread.

– ThesystemthreadcallsInitTCB to createanew thread.In additionto passingtheprogramcounterfor aC-- procedurewithout parameters,thesystemthreadmustprovide spacefor a stackon which the threadcanexecute,aswell asspacefor athread-controlblock.

– ThesystemthreadcallsResume to transfercontrolto a suspendedC-- thread.

– Executionof a C-- threadcontinuesuntil that threadcalls the C-- procedureyield, which suspendsexecutionof the C-- threadand causesa return fromthe systemthread’s Resume call. TheC-- threadpassesa yield code, which isreturnedastheresultof Resume.

For example,garbagecollectioncanbeinvokedvia acall toyield, whentheallocatorrunsout of space.Hereis how thecodemight look if allocationtakesplacein a singlecontiguousarea,pointedto by aheappointerhp, andboundedby heap_limit:

f( bits32 a,b,c ) {while (hp+12 > heap_limit) {

yield( GC ); /* Need to GC */}hp = hp+12;...

}

It mayseemunusual,evenundesirable,to speakof two “threads”in a completelyse-quentialsetting.In a moretightly-integratedsystemit would bemoreusualsimply tocall the garbagecollector. But simply makinga foreign call to the garbagecollectorwill not work here.How is thegarbagecollectorto find the top of theC-- portionofthestackthatit musttraverse?Whatif live variables(suchasa, b, c) arestoredin C’scallee-savesregistersacrossthecall to thegarbagecollector?Suchcomplicationsaffectnot only the garbagecollector, but any high-level run-timeservicethatneedsto walkthe stack.Our two-threadconceptualmodelabstractsaway from thesecomplicationsby allowing thesystemthreadto inspectandmodify a tidily frozenC-- thread.

Using “threads” doesnot imply a high implementationcost. Thoughwe call themthreads,“coroutines”maybea moreaccurateterm.Thesystemthreadnever runscon-currentlywith theC-- thread,andthetwo canbeimplementedby a singleoperating-systemthread.

Another merit of this two-threadview is that it extendssmoothly to accommodatemultiple C-- threads.Indeed,thoughit is not the focusof this paper, we intendthatC-- shouldsupportmany very lightweight threads,in the style of ConcurrentML(Reppy 1991),ConcurrentHaskell (PeytonJones,Gordon,andFinne1996),andmanyothers.

22

6.2 Safepoints

Whencanthesystemthreadsafelytakecontrol?Wesayaprogram-countervaluewithina procedureis a safepoint if it is safeto suspendexecutionof the procedureat thatpoint, andto inspectandmodify its variables.We requirethe following preconditionfor executionin thefront-endrun-timesystem:

A C-- threadcanbesuspendedonly ata safepoint.

A call to yield mustbea safepoint, andbecauseany procedurecouldcall yield,thecodegeneratormustensurethateverycall siteis a safepoint.Thissafepoint is as-sociatedwith thestatein which thecall hasbeenmadeandtheprocedureis suspendedawaitingthereturn.C-- doesnotguaranteethateveryinstructionisasafepoint;record-ing local-variablelivenessandlocationinformationfor everyinstructionmight increasethesizeof theprogramby a significantfraction(Stichnoth,Lueh,andCierniak1999).

SofarwehavesuggestedthataC-- programcanonly yieldcontrolvoluntarily, throughayield call. Whathappensif an interruptor fault occurs,transferringcontrol to thefront-endrun-timesystem,andthecurrentlyexecutingC-- procedureis not at a safepoint?Thismayhappenif auserdeliberatelycausesaninterrupt,e.g.,to requestthatthestackbeunwoundor thedebuggerinvoked.It mayhappenif ahardwareexception(e.g.,divideby zero)is to beconvertedto asoftwareexception.It mayhappenin aconcurrentprogramif timer interruptsareusedto pre-emptthreads.The answerto the questionremainsa topic for research;asynchronouspre-emptionis difficult to implement,notonly in C-- but in anysystem.Chase(1994b)andShivers,Clark,andMcGrath(1999)discusssomeof theproblems.Onecommontechniqueis to ensurethateveryloopis cutby a safepoint, andto permitan interruptedprogramto executeuntil it reachesa safepoint. C-- thereforeenablesthe front endto insertsafepoints,by insertingtheC--statement

safepoint;

6.3 Call-site invariants

In the presenceof garbagecollectionanddebugging,calls have an unusualproperty:live local variablesare potentiallymodifiedby any call. For example,a compactinggarbagecollectormight modify pointerssavedacrossa call. Considerthis function,inwhicha+8 is a commonsubexpression:

f( bits32 a ) {bits32[a+8] = 10; /* put 10 in 32-bit word at address a+8 */g( a );bits32[a+8] = 0; /* put 0 in 32-bit word at address a+8 */return;

}

23

If g invokesthegarbagecollector, thecollectormightmodifya duringthecall to g, sothecodegeneratormustrecomputea+8 afterthecall — it wouldbeunsafeto savea+8acrossthecall. Thesameconstraintsupportsa debuggerthatmight changethevaluesof local variables.Callsmayalsomodify C-- valuesthataredeclaredto beallocatedon thestack.

A compilerwriter might reasonablyobjectto theperformancepenaltyimposedby thisconstraint;thebackendpaysfor compactinggarbagecollectionwhetherthefront endneedsit or not. To eliminatethis penalty, the front endcanmarkC-- parametersandvariablesasinvariantacrosscalls, usingthekeywordinvariant, thus:

f( invariant bits32 a ) {invariant bits16 b;bits32 c;...g( a, b, c ); /* "a" and "b" are not modified

by the call, but "c" might be */...

}

Theinvariant keyword placesanobligationon thefront-endrun-timesystem,notonthecallerof f. Thekeywordconstitutesapromiseto theC-- compilerthatthevalueof aninvariant variablewill notchange“unexpectedly”acrossacall. Therun-timesystemanddebuggermaynotchangethevaluesof invariantvariables.

If variableswill not be changedby a debugger, a front end can safely mark non-pointervariablesasinvariantacrosscalls,andfront endsusingmostly-copying collec-tors(Bartlett1988; Bartlett1989a) ornon-compactingcollectors(BoehmandWeiser1988)cansafelymarkall variablesasinvariantacrosscalls.

7 Exceptionsand other services

In Section4 we arguedthatmany high-level run-timeservicesshareat leastsomere-quirementsin common.In general,they all needto suspenda runningC-- thread,andto inspectand modify its state.The spansof Section5.3 and the run-time interfaceof Section5.2 provide this basicservice,but eachhigh-level run-timeservicerequiresextra,special-purposesupport.Garbagecollectionis enhancedby theinvariant an-notationof Section6.3.Exceptionhandlingrequiresrich mechanismsfor changingtheflow of control.Interrupt-basedprofiling requirestheability to inspect(albeit in a verymodestway) the stateof a threadinterruptedasynchronously. Debuggingrequiresallof the above, andmorebesides.We believe that our designcanbe extendedto dealwith thesesituations,andthat many of C--’s capabilitieswill be usedby morethanonehigh-level service.Herewe give anindicative discussionof just oneotherservice,exceptionhandling.

Makingasinglebackendsupportavarietyof differentexception-handlingmechanismsis significantlyharderthansupportinga varietyof garbagecollectors,in part because

24

exceptionsalter thecontrolflow of theprogram.If raisinganexceptioncouldchangetheprogramcounterarbitrarily, chaoswould ensue;two differentprogrampointsmayhold their live variablesin differentlocations,andthey mayhave differentideasaboutthelayoutof theactivationrecordandthecontentsof callee-savesregisters.They mayevenhave differentideasaboutwhich variablesarealive andwhich aredead.In otherwords,unconstrained,dynamicchangesin locusof controlmake life hardfor thereg-isterallocatorandtheoptimiser;if theprogramcountercanchangearbitrarily, thereisnosuchthingasdeadcode,andavariableliveanywhereis liveeverywhere.

Typically, handlinganexceptioninvolvesfirst unwindingthestackto thecallerof thecurrentprocedure,or its caller, etc.,andthendirectingcontrolto anexceptionhandler.Many of themechanismsusedfor garbagecollectionarealsousefulfor exceptionhan-dling; for example,stackwalking andspanscanbeusedto find exactly which handlershouldcatchaparticularexception.But themechanismswehavedescribedsofardon’tallow for changesin controlflow. C-- controlssuchchangesby requiringannotationsonprocedurecalls.

Thekey ideais that,in thepresenceof exceptions,a call mightreturnto more thanonelocation,and everyC-- programspecifiesexplicitly all the locationsto which a callcouldreturn. In effect,acall hasmany possiblecontinuationsinsteadof justone.Whenanactivationis suspendedatacall site,threeoutcomesarepossible.

– Thecall returnsnormally, andexecutioncontinuesat thestatementfollowing thecall.

– Thecall raisesanexceptionthat is handledin theactivation,sothecall terminatesby transferringcontrolto a differentlocationin thatactivation.

– The call raisesan exceptionthat is not handledin the currentactivation, so theactivationis aborted,andtherun-timesystemtransferscontrolto ahandlerin somecallingprocedure.

C--’s call-siteannotationsspecifytheseoutcomesin detail.

We are currently refining a designthat supportssuitableannotations,plus a varietyof mechanismsfor transferof control(Ramsey andPeytonJones1999). Exceptiondis-patchmight unwind the stackoneframeat a time, looking for a handler, or it mightuseanauxiliary datastructureto find thehandler, then“cut thestack”directly to thathandlerin constanttime.Ourdesignalsopermitsexceptiondispatchto beimplementedeitherin thefront-endrun-timesystemor in generatedcode.TheC-- run-timesystemprovidessupportingproceduresthatcanunwindthestack,changetheaddressto whicha call returns,andpassvaluesto exceptionhandlers.

8 Statusand conclusions

Thecoredesignof C-- is stable,andanimplementationbasedon ML-RISC is freelyavailablefrom theauthors.This implementationsupportsthefeaturesdescribedin Sec-tion 3, but it doesnotyet includethespan directiveor aC-- run-timesystem.

25

Many openquestionsremain.What small set of mechanismsmight supportthe en-tire gamutof high-level languageexceptionsemantics?How abouttherangeof knownimplementationtechniques?Supportfor debuggingis evenharderthandealingwith ex-ceptions.Exactlywhatis themeaningof abreakpoint?How shouldbreakpointsinteractwith optimization?Whataretheprimitive “hooks” requiredfor concurrency support?How shouldC-- copewith pre-emption?

Thesequestionsarenot easilyanswered,but theprize is considerable.Reuseof codegeneratorsis a critically importantproblemfor languageimplementors.Codegenera-tors embeddedin C compilershave beenwidely reused,but the natureof C makesitimpossibleto usethebestknown implementationsof high-level run-timeserviceslikegarbagecollection,exceptionhandling,debugging,andconcurrency — C imposesaceilingon reuse.

We hopeto breakthroughthis ceiling by taking a new approach:designa low-level,reusablecompiler-targetlanguagein tandemwithalow-level,reusablerun-timesystem.Together, C-- and its run-timesystemshouldsucceedin hiding machine-dependentdetailsof calling conventionsandstack-framelayout.They shouldeliminatethe dis-tinctionbetweenvariablesliving in registersandvariablesliving onthestack.By doingso,they should

– Permitsophisticatedregisterallocation,evenin thepresenceof agarbagecollectoror debugger.

– Make the resultsof livenessanalysesavailableat run time, e.g.,to a garbagecol-lector.

– Supportthebestknown garbage-collectiontechniques,andpossiblyenableexper-imentationwith new techniques.

Althoughthedetailsarebeyondthescopeof thispaper, wehavesomereasonto believeC-- can also supportthe bestknown techniquesfor exceptionhandling,as well assupportingprofiling, concurrency, anddebugging.

Acknowledgements

WethankXavier Leroy, SimonMarlow, GopalanNadathur, MikeO’Donnell,andJulianSewardfor theirhelpful feedbackonearlierdraftsof thispaper. We alsothankRichardBlack,Lal George,ThomasJohnsson,Greg Morrisett,Nikhil, Olin Shivers,andDavidWatt for feedbackon thedesignof C--.

26

References

Appel, Andrew W. 1989(February). Simplegenerationalgarbagecollectionandfastallocation.Software—Practice& Experience, 19(2):171–183.

Atkinson,Russ,Alan Demers,CarlHauser, ChnristianJacobi,PeterKessler, andMarkWeiser. 1989(July).ExperiencescreatingaportableCedar.Proceedingsof the’89SIGPLANConferenceon ProgrammingLanguage Designand Implementation,SIGPLANNotices, 24(7):322–329.

Bartlett,JoelF. 1988(February).Compactinggarbagecollectionwith ambiguousroots.TechnicalReport88/2,DEC WRL, 100 Hamilton Avenue,Palo Alto, California94301.

. 1989a(October).Mostly-copying garbagecollectionpicksupgenerationsandC���

. TechnicalReportTN-12, DEC WRL, 100 Hamilton Avenue,Palo Alto,California94301.

. 1989b. SCHEMEto C: A portableScheme-to-Ccompiler. TechnicalReportRR 89/1,DECWRL.

Benitez,ManuelE. andJackW. Davidson.1988(July). A portableglobal optimizerandlinker. In ACM Conferenceon ProgrammingLanguagesDesignand Imple-mentation(PLDI’88), pages329–338.ACM.

Benton,Nick, Andrew Kennedy, andGeorge Russell.1998(September).CompilingStandardML to Java bytecodes. In ACM SigplanInternationalConferenceonFunctionalProgramming(ICFP’98), pages129–140,Balitmore.

Bernstein,RobertL. 1985 (October). Producinggood codefor the casestatement.SoftwarePracticeandExperience, 15(10):1021–1024.

Boehm,Hans-JuergenandMark Weiser. 1988(September).Garbagecollectionin anuncooperativeenvironment.SoftwarePracticeandExperience, 18(9):807–820.

Chase,David. 1994a(June).Implementationof exceptionhandling,PartI. TheJournalof C LanguageTranslation, 5(4):229–240.

. 1994b(September).Implementationof exceptionhandling,Part II: Callingconventions,asynchrony, optimizers,anddebuggers.TheJournal of C LanguageTranslation, 6(1):20–32.

Clausen,LR andO Danvy. 1998(April). Compilingpropertail recursionandfirst-classcontinuations:Schemeon theJava virtual machine.Technicalreport,Departmentof ComputerScience,Universityof Aarhus,BRICS.

Conway, ME. 1958(October).Proposalfor anUNCOL. Communicationsof theACM,1(10):5–8.

Diwan,A, D Tarditi, andE Moss.1993(January).Memorysubsystemperformanceofprogramsusingcopyinggarbagecollection.In 21stACMSymposiumonPrinciplesof ProgrammingLanguages(POPL’94), pages1–14.Charleston:ACM.

Franz,Michael.1997(October).BeyondJava: An infrastructurefor high-performancemobilecodeon theWorld Wide Web. In Lobodzinski,S. andI. Tomek,editors,Proceedingsof WebNet’97,World Conferenceof theWWW, Internet,andIntranet,pages33–38.Associationfor theAdvancementof Computingin Education.

George,Lal. 1996.MLRISC: Customizableandreusablecodegenerators.Unpublishedreportavailablefrom http://www.cs.bell-labs.com/george/.

Harbison,SamuelP. andGuyL. Steele,Jr. 1995.C: AReferenceManual. fourthedition.EnglewoodCliffs, NJ:PrenticeHall.

Henderson,Fergus,ThomasConway, andZoltanSomogyi.1995.Compilinglogic pro-gramsto C using GNU C as a portableassembler. In ILPS’95 PostconferenceWorkshopon SequentialImplementationTechnologies for Logic Programming,pages1–15,Portland,Or.

Liskov, BarbaraH. andAlan Snyder. 1979(November).Exceptionhandlingin CLU.IEEETransactionsonSoftwareEngineering, SE-5(6):546–558.

Macrakis,Stavros.1993(January).TheStructure of ANDF: PrinciplesandExamples.OpenSystemsFoundation.

Novack,Steven,JosephHummel,andAlexandruNicolau.1995.A SimpleMechanismfor Improving the Accuracy and Efficiencyof Instruction-level Disambiguation,Chapter19,pages289–303.Number1033in LectureNotesin ComputerScience.Springer-Verlag.

Pettersson,M. 1995.Simulatingtail callsin C. Technicalreport,Departmentof Com-puterScience,LinkopingUniversity.

Peyton Jones,SimonL., A. J. Gordon,andS. O. Finne.1996(January).ConcurrentHaskell. In 23rd ACM Symposiumon Principles of ProgrammingLanguages(POPL’96), pages295–308,StPetersburg Beach,Florida.

Peyton Jones,SL, D Oliva,andT Nordin.1998. C--: A portableassemblylanguage.In Proceedingsof the 1997 Workshopon ImplementingFunctional Languages(IFL’97), LectureNotesin ComputerScience,pages1–19.SpringerVerlag.

PeytonJones,SimonL. 1992(April). Implementinglazyfunctionallanguagesonstockhardware:ThespinelesstaglessG-machine.Journalof FunctionalProgramming,2(2):127–202.

Ramsey, NormanandSimonL. Peyton Jones.1999. Exceptionsneednot be excep-tional. Draft availablefrom http://www.cs.virginia.edu/nr.

Ramsey, Norman.1992(December).A RetargetableDebugger. PhDthesis,PrincetonUniversity, Departmentof ComputerScience.Also TechnicalReportCS-TR-403-92.

Reig,F andSL PeytonJones.1998.TheC-- manual.Technicalreport,DepartmentofComputingScience,Universityof Glasgow.

Reppy, JH.1991(June).CML: ahigher-orderconcurrentlanguage.In ACMConferenceonProgrammingLanguagesDesignandImplementation(PLDI’91). ACM.

Serrano,ManuelandPierreWeis.1995(September).Bigloo: a portableandoptimiz-ing compilerfor strict functionallanguages.In 2nd StaticAnalysisSymposium,LectureNotesin ComputerScience,pages366–381,Glasgow, Scotland.

Shivers,Olin, JamesW. Clark,andRolandMcGrath.1999(September).Atomic heaptransactionsandfine-graininterrupts. In ACM SigplanInternationalConferenceonFunctionalProgramming(ICFP’99), Paris.

SPARC International.1992. TheSPARCArchitecture Manual,Version8. EnglewoodCliffs, NJ:PrenticeHall.

28

Stallman,RichardM. 1992(February).UsingandPorting GNUCC(Version2.0). FreeSoftwareFoundation.

Steele,GL. 1978. Rabbit:a compilerfor Scheme.TechnicalReportAI-TR-474,MITLab for ComputerScience.

Stichnoth,JM, G-Y Lueh,andM Cierniak.1999(May). Suppportfor garbagecollec-tion atevery instructionin a Java compiler. In ACM Conferenceon ProgrammingLanguagesDesignandImplementation(PLDI’99), pages118–127,Atlanta.

Taft, Tucker. 1996.ProgrammingtheInternetin Ada95. In Strohmeier, Alfred, editor,1996 Ada-Europe InternationalConferenceon ReliableSoftware Technologies,Vol. 1088of Lecture Notesin ComputerScience, pages1–16,Berlin. Availablethroughwww.appletmagic.com.

Tarditi, David, AnuragAcharya,andPeterLee.1992. No assemblyrequired:compil-ing StandardML to C. ACM Letters on ProgrammingLanguagesand Systems,1(2):161–177.

Tarditi, D, G Morrisett,P Cheng,C Stone,R Harper, andP Lee.1996(May). TIL: Atype-directedoptimizingcompilerfor ML. In ACM Conferenceon ProgrammingLanguagesDesignandImplementation(PLDI’96), pages181–192.Philadelphia:ACM.

Wakeling,D. 1998(September).Mobile Haskell: compilinglazy functionallanguagesfor theJava virtual machine.In Proceedingsof the10thInternationalSymposiumonProgrammingLanguages,Implementations,LogicsandPrograms(PLILP’98),Pisa.

29