advcomp slides juhe

Upload: vadriangmail

Post on 02-Jun-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/11/2019 Advcomp Slides Juhe

    1/27

    1 of 27

    Advanced Compiler

    Design andImplementation:

    Run-Time SupportJuhana Helovuo

    Data type representations and instruction set support

    Register set and register usage

    Activation records and run-time stack

    Parameter passing modes

    Code for subroutine calls

  • 8/11/2019 Advcomp Slides Juhe

    2/27

    2 of 27

    Shared object code

    Dynamic typing, heap management, function polymorphism

  • 8/11/2019 Advcomp Slides Juhe

    3/27

    3 of 27

    Data type representations

    Fixed-size integers: word, halfword, byte

    How to treat integers with size < register size

    Example: Add 5 to signed byte @ sp+72

    Different sizes of loads, stores and arithmetic (M68k)

    addi.b (72,a7), 5 ; add immediate byte

    Sign/zero-extend on load instructions (Sparc)

    ldsb [%sp+72],%l2 ; load signed byte (and extend)

    add %l2, 5, %l2 ; add (32-bit)

    stb %l2, [%sp+72] ; store byte

  • 8/11/2019 Advcomp Slides Juhe

    4/27

    4 of 27

    Sign/zero-extend and align with separate instructions

    (Alpha)

    ldq_u r2, 72(sp) ; load quadword unaligned -> r2

    lda r1, 72(sp) ; load address -> r1

    extbl r2, r1, r3 ; extract 1 byte from r2 -> r3

    mskbl r2, r1, r2 ; mask (clear) byte from r2

    addq r3, 5, r3 ; add quadword (64-bit)

    insbl r3, r1, r3 ; shift byte back in positionor r2, r3, r2 ; combine result & rest of qword

    stq_u r2, (r1) ; write quadword back to memory

    The general case seems very complex, but case-specific

    optimizations often simplify this (register allocation,alignment, BWX)

    Very simple memory unit: Only aligned 64-bit loads & stores

    For integer size > register size: Use two or four registers

    Architecture may provide double load for two consecutive

    registers (Sparc) or multiple load (ARM)

  • 8/11/2019 Advcomp Slides Juhe

    5/27

    5 of 27

    Long arithmetic

    Use carry flag for addition & subtraction (Sparc)addcc %i1, %i3, %l0 ; add low words, generate carry

    addx %i2, %i4, %l1 ; add high words + carry

    Or use unsigned less than-comparison (Alpha)

    addq a0, a2, t0 ; add low wordsaddq a1, a3, t1 ; add high words

    cmpult t0, a0, t2 ; generate carry: t2 = (t0

  • 8/11/2019 Advcomp Slides Juhe

    6/27

    6 of 27

    Character strings

    C-style strings: Array of characters, end of string marked bycharacter code 0

    Pascal-style strings: Character count (integer) followed by an

    array of characters

    Instruction set support

    x86: store string or move string instructions + repeat prefix,

    byte-sized operations

    Sparc: byte loads and stores

    Alpha: insert, extract, mask, zap, cmpbge

    PowerPC: load/store string (and compare)

  • 8/11/2019 Advcomp Slides Juhe

    7/27

    7 of 27

    Pointers

    Usually 32/64-bit words (same as register size)

    Naturally aligned: pointer mod sizeof(pointed data) = 0

    Array access often requires pointer arithmetic

    base pointer + (index * element size)

    Element size is often 4 or 8

    Special support for address computation

    ARM: Data path for second operand contains a shifter unit

    Alpha: s4add, s8add, s4sub, s8sub

    PowerPC, ARM, Sparc: Indexed addressing mode

    lwzx r0,r9,r2 ; r0 := M[r9+r2] (PowerPC)

    ld [%i2+%i3], %l1 ; l1 := M[i2+i3] (Sparc)

  • 8/11/2019 Advcomp Slides Juhe

    8/27

    8 of 27

    Register Usage

    Typical RISC has 32 integer registers

    (ARM: 16, Itanium: 128, Sparc: register windows, x86: ~8)

    Compiler typically has several uses for registers

    stack pointer and frame pointer

    global offset table pointer (global pointer)

    dynamic link and static link

    call arguments and return values

    local variables

    frequently used global variables

    temporary values

  • 8/11/2019 Advcomp Slides Juhe

    9/27

    9 of 27

    ...Register Usage

    The compiler should maximize the use of the register set in

    order to avoid memory accesses

    The partitioning of the register set may be partially

    determined by

    ISA (Instruction Set Architecture = hardware platform) and

    ABI (Application Binary Interface = system software)

    ISA usually defines or recommends a stack pointer, possibly

    also frame pointer and link register

    ABI may define argument and return value registers

    ABI must be followed to maintain interoperability with other

    compilers and libraries

  • 8/11/2019 Advcomp Slides Juhe

    10/27

    10 of 27

    Register partitioning example (Alpha)

    v0 = return value, a0..a5 = call arguments, ra=return address

    s0..s5 = local/global variables, preserved across calls

    t0..t11 = local variables and temporaries, not preserved

    pv = call address, gp = global pointer, AT = assembler temp.

    v0 t7s0

    s1

    fp

    t8

    t9

    t10t11

    pv

    ATgp

    sp

    zero

    s2

    s3

    s4

    s5

    a0a1

    a2

    a3

    a4

    a5

    ra

    t0

    t1

    t2

    t3

    t4

    t5

    t6

    r0

    r7 r31

    r24r1

    r2

    ...

    ...

  • 8/11/2019 Advcomp Slides Juhe

    11/27

    11 of 27

    The Run-Time Stack

    The run-time stack is used to store activation records (stackframes)

    Activation records represent

    procedure activations and they may

    contain

    dynamic link and static link

    call arguments and return values

    local variables

    saved registers (by caller and callee)

    procedure call return address

    The stack is maintained and accessed through the stack

    pointer register, often also by the frame pointer

    sp

    fp

    currentframe

    previous

    frame

    sp+N

    fp-M

  • 8/11/2019 Advcomp Slides Juhe

    12/27

    12 of 27

    The activation record is used to communicate between the

    caller (main program) and callee (subroutine)

    These procedures may be compiled separately

    The compiler must adhere to a call convention, or a

    procedure call protocol

    Parts of the activation record are constructed by the caller

    and some parts by the callee

    Only the caller may know the size of argument list (C)

    Only the callee knows the storage required for local

    variables

    Both have to be able to access arguments, return value and

    links (dynamic, static, return address)

  • 8/11/2019 Advcomp Slides Juhe

    13/27

    13 of 27

    Links in Stack Frame

    Dynamic Link

    Used to find the calling stack frame on return

    If the frame size is fixed and static, then there is no need for

    this. Just use a constant offset in the codeStatic Link

    Used to find the last activation of the static parent of the

    current frame

    Required only in languages allowing nested, local

    procedures (e.g. Pascal, Ada, not in C)

    Return Address

    Used to find the code of the caller on procedure exit

    RISCs store return address into a link registeron call (jump-

    and-link) instruction

  • 8/11/2019 Advcomp Slides Juhe

    14/27

    14 of 27

    Parameter passing modes

    Call by value: Argument value is copied into the callee. Theoriginal variable of the caller is not modified during the call.

    Default in most languages (except Fortran and Perl)

    Call by result: Argument is copied from the callee to thecaller. Used to return values.

    Call by value-result: Argument is copied both ways.

    Call by reference: Callee gets a reference (pointer) to amemory location holding the argument. Callee can modify

    the argument.

    Call by name: Like call by reference, but the argument pointer

    expression is recomuputed at each access.

    The callee is passed a small anonymous function to

    compute the address of the argument.

  • 8/11/2019 Advcomp Slides Juhe

    15/27

    15 of 27

    Procedure Call and Return

    Callers view of a subroutine call

    Call

    1. Evaluate each argument and place them in argumentregisters or stack frame

    2. Determine the address of the subroutine (mostly done by thelinker)

    3. Store caller-save -registers in stack frame

    4. Compute a static link for the subroutine, if necessary

    5. Save the return address and jump to the subroutineReturn

    1. Restore saved registers from stack

    2. Use the return value

  • 8/11/2019 Advcomp Slides Juhe

    16/27

    16 of 27

    Epilogue and Prologue

    Callees view of the call

    Prologue

    1. Save frame pointer, copy stack pointer to frame pointer,compute new stack pointer, i.e. allocate new stack frame

    2. Save callee-save registers, if necessary3. Construct a display (cache of static links), if necessary

    Procedure body is executed between the prologue and the

    epilogue

    Epilogue

    1. Restore saved callee-save registers

    2. Restore SP from frame pointer and FP from dynamic link

    3. Place return value in appropriate register or stack location

    4. Jump to return address

  • 8/11/2019 Advcomp Slides Juhe

    17/27

    17 of 27

    Call Example

    Sample C codeint test_proc(int a1, int a2)

    {

    int lv1, lv2;

    ...

    return ...;

    }

    ...

    r = test_proc(r,4);

    Subroutine with two intparameters and two intlocals

  • 8/11/2019 Advcomp Slides Juhe

    18/27

    18 of 27

    PowerPC calling convention (MacOS X)

    stack framesare ofstatic and fixed size

    no frame pointer

    callee saves asmany registers as it

    uses

    frame contains

    outgoingarguments(incoming

    arguments in

    previous frame)

    callee may storeincoming args in

    callers frame if it

    needs them in memory

    r0

    r1

    r2

    r3

    r10

    r11

    r12

    r13

    r31

    link

    count

    cond

    exception

    zero/temp

    stack ptr

    temp

    arg0/ret.v.

    arg7

    temp

    indir. branch target

    localvariables

    Register partitioning Stack frame structure

    old SPSP

    saved cond

    saved link

    ???

    SP+24 arg0

    argN

    localvariables

    savedregisters

    prev. frameold SP

    and temps

    in memory

    outgoing

    args

    arg1/ret.v.arg2...

  • 8/11/2019 Advcomp Slides Juhe

    19/27

    19 of 27

    PowerPC assembly for example call

    Prologue and epilogue_test_proc:

    mflr r0 ; r0

  • 8/11/2019 Advcomp Slides Juhe

    20/27

    20 of 27

    Procedure-valued variables

    Rare in imperative languages, routine in functional languages

    C provides function pointers

    Simple to implement as plain code pointers

    This is sufficient, since there are no local procedures

    Nested procedures require prodecure values to contain both

    code pointer and static link (=closure)

    Static link is required to find the local variables of enclosing

    scope

    Now activation records may have to live even after the

    function execution has ended. Stack allocation is notsufficient for all procedures

  • 8/11/2019 Advcomp Slides Juhe

    21/27

    21 of 27

    Position-Independent Code

    Required for shared libraries - and more generally - for anydynamically loadable code, e.g. plugin modules

    Only one copy of shared code in memory code cannot be

    modified at load time

    PIC must be loadable to an arbitrary memory location

    Code and data references must work regardless of code

    location

    Local data references are SP-based ok

    Jumps within the same object module can use relative

    addressing ok

    Global data references and jumps from object module to

    another cannot be absolute use indirect addressing

  • 8/11/2019 Advcomp Slides Juhe

    22/27

    22 of 27

    Global Offset Table

    Global Offset Table (GOT) is a pointer table used to point toglobal symbols, whose addresses are not known until

    program load time.

    Data References

    The compiler generates indirect references though the GOT

    The link-editor relocates the reference as a GOT offset

    The run-time linker fills the GOT with actual symbol

    addresses, when it knows where the object will be loaded

    Code References

    Calls to shared code jump to an element of Procedure

    Linkage Table (PLT)

    PLT element contains code to load an address from GOT and

    a jump to that address

  • 8/11/2019 Advcomp Slides Juhe

    23/27

    23 of 27

    GOT Example (Sparc)

    Procedure prologue.LLGETPC0: ; helper function

    retl ; to read program counter

    add %o7, %l7, %l7 ; %l7 += return address

    so_func: ; actual procedure start

    save %sp, -112, %sp ; allocate stack frame

    sethi %hi(_GLOBAL_OFFSET_TABLE_-4), %l7

    call .LLGETPC0

    add %l7, %lo(_GLOBAL_OFFSET_TABLE_+4), %l7

    ; now %l7 contains the address of GOT

    ; _GLOBAL_OFFSET_TABLE_ is a PC-relative symbol

    Loading from global data symbol si; symbol si has relocation type GOT, i.e. it is treated as

    ; an offset into GOT, not actual memory address

    sethi %hi(si), %g1

    or %g1, %lo(si), %g1 ; %g1 = GOT offset for sild [%l7+%g1], %g1 ; load address of si from GOT

    ld [%g1], %i0 ; load value of si

  • 8/11/2019 Advcomp Slides Juhe

    24/27

    24 of 27

    Calling via PLT call so_aux, 0 ; looks normal, but symbol so_aux

    ; has relocation type PLT

    ; linker relocates this to .PLT2

    The call is to object module-local PLT, not actual subroutine

    in another object

    .PLT2

    sethi (. - .PLT0), %g1

    sethi %hi(so_aux), %g1

    jmp %g1+%lo(so_aux)

    .PLT3

    sethi (. - .PLT0), %g1

    ba,a .PLT0

    nop

    .PLT0

    save %sp, -64, %sp

    call dyn_linker

    so_aux:

    save ......

    PLT

    Code from shared object

    0: first entry

    2: run-time

    3: not yet

    linked entry

    linked

  • 8/11/2019 Advcomp Slides Juhe

    25/27

    25 of 27

    Dynamic typing and polymorphism

    Dynamic typing: The programming language does notassociate types to variables, but rather to data values

    Variable name can refer to value of any type

    Dynamic typing is usually implemented by taggingdatavalues. Each value carries a type tag with it.

    The compiler should generate efficient code for resolving the

    types of data values and selecting the corresponding

    (polymorphic) operation on them, e.g. a+b on integers, floats,stings or bignums.

    Modern architectures have very little built-in hardware

    support for this

    Sparc provides tagged addand subtractinstructions

  • 8/11/2019 Advcomp Slides Juhe

    26/27

    26 of 27

    Storage management

    Fully manual: mallocand freein C, ornewand deletein C++

    Automatic deallocation: newbut no delete in Java

    Fully automatic: All memory management operationsimplicit, in e.g. Lisp or Haskell

    Automatic deallocation is usually based on reference

    counting, garbage collection, or combination of both

    Manual allocation is usually implemented as a library call

    e.g. Doug Leas dlmalloc library has been shown to

    outperform custom memory allocation routines

  • 8/11/2019 Advcomp Slides Juhe

    27/27

    27 of 27

    Summary

    Language semantics and run-time services (dynamicloading, code sharing, memory management) may require

    complicated run-time support

    It should be possible to optimize away costly parts of the

    procedure call mechanism to obtain good call performance.

    The amount of required run-time support code depends on

    the language and hardware.

    Modern RISCs do not have much explicit architecturalsupport for specific high-level languages, but this can be

    compensated in software