nci report: zephyr

Post on 29-Jan-2016

67 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

NCI Report: Zephyr. PLDI NCI Tutorial. University of Virginia Princeton University. Zephyr Goals. Goal Deliver high-quality, language-neutral tools for rapidly constructing compilers for experimental computing systems research How - PowerPoint PPT Presentation

TRANSCRIPT

NCI Report: Zephyr

PLDI NCI TutorialPLDI NCI Tutorial

University of VirginiaUniversity of Virginia

Princeton UniversityPrinceton University

6/16/2000 PLDI NCI Tutorial 2

Zephyr Goals

• Goal– Deliver high-quality, language-

neutral tools for rapidly constructing compilers for experimental computing systems research

• How– Provide specification languages and

processors to automatically generate key compiler components•Don’t write code, write specifications!

6/16/2000 PLDI NCI Tutorial 3

Zephyr Compilers

EDG C++Java

MachSUIF

SUIF-to-VPOBridge

VPO

lccEDG C++

Alpha

SUIF

Sparc MIPS X86Alpha X86

In terprocedura lanalysis

Para lle lizationand loca lity

optsO bject-oriented

optsScheduling

RegisterA llocation

Instruction se lectionRegister a llocation

Code motionM emory access

coalescingInduction variab le

e lim inationCSE

Loop unro llingIn lin ing

SUIF Zephyr

6/16/2000 PLDI NCI Tutorial 4

Zephyr Building Blocks

• ASDL: Abstract Syntax Description Language

• VPO: Very Portable Optimizer• CSDL: Computer System

Description Language

6/16/2000 PLDI NCI Tutorial 5

ASDL: Abstract Syntax Description Language

Parser

Lexer

Toke

ns

ASTSemanticAnalysis

AS

T

Translate IR OPT1

....

IR

IR OPTn

IR

CodeGen

AST IR

GlueGenerator

GlueDescription

6/16/2000 PLDI NCI Tutorial 6

ASDL

• ASDL makes it easy to communicate complex recursive data structures

• ASDL and its tools provide – Concise descriptions of tree-like

structures, including ASTs and compiler (IRs)

– Automatic generation of data structure implementations and pickling functions for C, C++, Java, Standard ML, and Haskell.

– Graphical browsing and editing of data structures on disk.

6/16/2000 PLDI NCI Tutorial 7

ASDL

• For more information about ASDL see:– Give reference here– Give URL here

6/16/2000 PLDI NCI Tutorial 8

VPO: Very Portable Optimizer

• VPO is a retargetable optimizer that operates on a low-level, machine-independent representation called RTLs (register transfer lists)

• VPO is retargeted by providing a machine description (MD) of the target machine, and revising a few machine-dependent routines

• VPO is small, easily extended, and extremely effective

6/16/2000 PLDI NCI Tutorial 9

History Lesson

• PO developed in 1981– Pioneered use of RTLs– Demonstrated ability to

do optimizations on low-level representation

• Development split in 1982– gcc development

• Richard Stallman and Len Tower

– VPO development• Many people at Uva

and a few industrial labs

P O

V P O gcc

6/16/2000 PLDI NCI Tutorial 10

Register Transfer Lists• Based on Bell and Newell's ISP

notation• Machine-independent

representation of a machine-dependent operation

• Algorithms that manipulate RTLs are machine-independent

6/16/2000 PLDI NCI Tutorial 11

Register Transfer Lists• While assembly language notations

may very, RTLs are very similar across architectures

ExampleRTL Machineadd %o1,%o2,%o2 SPARCaddu $10,$10,$9 MIPSar 10,9 IBM

in RTL each operation would be representedr[10] = r[10] + r[9];

6/16/2000 PLDI NCI Tutorial 12

RTLs

• The form of RTLs are fixed• dst = src ; dst = src ; dst = src …

– The individual register transfers are performed in parallel

– Example• r[1] = r[1] + r[2] ; NZ = r[1] + r[2] ? 0

– VPO provides machine-independent primitives for operating on and manipulating RTLs• Obtain the sources and destinations• Obtain the memory locations read and written• Obtain the type of instruction (arithmetic,

branch, control transfer, etc.)

6/16/2000 PLDI NCI Tutorial 13

RTLs

• Think of RTL as a machine-independent assembly language– For a machine X, each RTLx describes

an instruction in X’s instruction set (may be a synthetic instruction)

– RTLx should specify• instruction’s input and outputs• the transformation the instruction

makes on the machine state– VPO uses this information to

compute a dataflow graph

6/16/2000 PLDI NCI Tutorial 14

Compilation with VPO

SourceCode

Front andMiddle Ends

VPO Mach MachineCode

RTL

You supply the front end and a simple code generator, we supply an optimizing back end

6/16/2000 PLDI NCI Tutorial 15

Generating RTLX

• Translate IL ops to semantically equivalent sequences of instructions for the target machine– Generate RTL representation of

instructions, not assembly language– Do not worry about code quality

• Perform naïve, straightforward translation• Expose all computations (even effective

address computations) to VPO• Use virtual or pseudo registers for temporaries• VPO handles activation record and data

placement

6/16/2000 PLDI NCI Tutorial 16

Generating RTLx

The C codeK = I + 1;

= <int,32>

ADDR K<local,32>

+ <int,32>

@ <int,32>

ADDR I<local,32>

CON 1<int,32>

IL SPARC RTLADDR int K r[33]=r[14]+K.;ADDR int I r[34]=r[14]+I.;@ int r[35]=M[r[34]]; r[34]CON int 1 r[36]=1;+ int r[37]=r[35]+r[36]; r[35]:r[36]= int M[r[33]]=r[37]; r[33]:r[37]

6/16/2000 PLDI NCI Tutorial 17

VPO design rationale• All "traditional" optimizations performed

at the machine-level on a single representation—RTL– most optimizations are machine-dependent– better code is produced– instruction selection can be performed on

demand– avoids phase ordering problems– simplifies implementation of optimizations– easier to accommodate emerging

architectures– "plug and play" structure

6/16/2000 PLDI NCI Tutorial 18

RTLs in VPO

• VPO optimization algorithm– repeat

apply code-improving transformationuntil fixed-point reached or exhausted registers

• Maintaining two invariants– Semantic invariant (S)

• Observable behavior of program unchanged (according to RTL semantics)

– Machine invariant (M)• Every RTL equivalent to one machine instruction

6/16/2000 PLDI NCI Tutorial 19

VPO code improvements

• Each code-improving transformation is– machine-level, but– machine-independent

• Any semantics-preserving transformation is OK

• Preserve machine invariant (M) using machine description;– for each new RTL produced, ask MD if OK– if any is not target machine instruction,

roll back transformation

6/16/2000 PLDI NCI Tutorial 20

Code improvement catalog

• Register assignment and allocation

• Common subexpression elimination

• Induction variable elimination

• Code motion• Constant propagation• Copy propagation• Memory access

coalescing

• Recurrence detection

• Instruction scheduling

• Dead code elimination

• Constant folding• Loop unrolling• Branch minimization• Evaluation order

determination

6/16/2000 PLDI NCI Tutorial 21

VPO Optimizations

• Common subexpression elimination•Davidson, J. W. and Fraser, C. W.,

‘Eliminating Redundant Object Code,’ in Conference Record of the Ninth Annual ACM Symposium on Principles of Programming Languages, January 1982, pp. 128–132.

• Evaluation Order Determination•Davidson, J. W. , ‘A Retargetable Instruction

Reorganizer’, in Proceedings of the SIGPLAN ‘86 Symposium on Compiler Construction, 21(7), June 1986, pp. 23–241.

6/16/2000 PLDI NCI Tutorial 22

VPO Optimizations

• Link-time optimization• Benitez, M. E. and Davidson, J. W., ‘A Portable

Global Optimizer and Linker’, in Proceedings of the SIGPLAN ‘88 Symposium on Programming Language Design and Implementation, June 1988, pp. 329—338.

• Memory access coalescing• Davidson, J. W. and Jinturkar, S., ‘Memory

Access Coalescing: A Technique for Eliminating Redundant Memory Accesses’, in Proceedings of the SIGPLAN ‘94 Symposium on Programming Language Design and Implementation, Orlando, FL, June 1994, pp. 186— 195.

6/16/2000 PLDI NCI Tutorial 23

VPO Optimizations

• Code Motion• Benitez, M. E. and Davidson, J. W., ‘The

Advantages of Machine-Dependent Global Optimization’, in Proceedings of the 1994 Conference on Programming Languages and Systems Architectures, Zurich, Switzerland, March 1994, pp. 105–124.

• Loop Unrolling• Jinturkar, S. and Davidson, J. W., ‘Improving

Instruction-level Parallelism by Loop Unrolling and Dynamic Memory Disambiguation’, in Proceedings of the 28th Annual IEEE/ACM International Symposium on Microarchitecture, Ann Arbor, MI, November 1995, pp. 125–132.

6/16/2000 PLDI NCI Tutorial 24

VPO Optimizations

• Branch mininization•F. Mueller and D. B. Whalley, ‘Avoiding

Conditional Branches by Code Replication’ in Proceedings of the SIGPLAN '95 Conference on Programming Language Design and Implementation, June 1995, pages 56-66.

•M. Yang, G. Uh, and D. Whalley, ‘Improving Performance by Branch Reordering’ in Proceedings of the SIGPLAN '98 Conference on Programming Language Design and Implementation, June 1998, pages 130-141.

6/16/2000 PLDI NCI Tutorial 25

VPO Optimizations

• Recurrence detection and optimization

•Benitez, M. E. and Davidson, J. W., ‘Code Generation for Streaming: an Access/Execute Mechanism’, in Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, April 1991, pp. 132–141.

6/16/2000 PLDI NCI Tutorial 26

Building VPO

VPOGenerator

Eval. Order Determ.

ZIFLow Analysis &Transformation Libraries

VPOMIPS

CSDLSPARCSpecification

NewTransformation

CSDLMIPSSpecification

CSDLALPHASpecification

CSDLi486Specification

Register Allocation

Access Coalescing

Comm. Subexpr. Elim.

Eval. Order Determ.

Induction Var. Elim.

Instruction Scheduling

Code Motion

SSA Computation

6/16/2000 PLDI NCI Tutorial 27

CSDL: Computing System Description Language

• Computing System Description Language– Modular system of components– Allows applications to customize a

description– Easily extensible for adding new

details– Reusable/application independent

6/16/2000 PLDI NCI Tutorial 28

CSDL

CallingConvention

(CCL)

MemorySystem

Description(MSDL)

PipelineDescription(PLUNGE)

CSDL Core

InstructionRepresentation

(SLED)

Object-fileFormat

MemorySystem(MSDL)

CallingConvention

(CCL)PipelineDescription(PLUNGE)Pipeline

(PLUNGE)

InstructionSemantics

(l -RTL)

6/16/2000 PLDI NCI Tutorial 29

Zephyr Compilers

• EDGSUIF-to-VPO Compiler– Five targets (SPARC, Pentium, Alpha,

MIPS, SimpleScalar)

TargetMachine

Code

EDG Front EndSourceCode ...

SUIF Pass 1 VPOSPARC

SUIF Passes

SUIF-to-LIRALIRA-to-SPARC

RTLSPARC

6/16/2000 PLDI NCI Tutorial 30

Zephyr Compilers

• EDG-to-VPO C++ compiler– Funded by Edison Design group– Targeted to SPARC only– Compiles all benchmark suites (SPEC,

PGI, lcc)– Code generator (translator from EDG

intermediate representation to RTLs) provided as a literate program

6/16/2000 PLDI NCI Tutorial 31

Zephyr Compilers

• lcc-to-VPO C compiler– Targeted to SPARC, X86, MIPS, ALPHA,

and SimpleScalar– Code generators (translators from LIRA

to target-machine RTLs) provided as literate programs

– Currently producing good code, some optimizations are not fully implemented/debugged

6/16/2000 PLDI NCI Tutorial 32

SPEC results for SPARC

Benchmark Gcc –O Lcc vpolcc go 13.4 6.45 11.0 M88ksim 5.70 4.98 6.2 li 8.98 5.93 7.48 Compress 11.6 9.0 9.28 Ijpeg 8.79 5.54 8.6 Perl 12.3 9.2 10.2 Vortex 10.7 8.27 11.2

6/16/2000 PLDI NCI Tutorial 33

Acknowledgements

• This work has been funded by:– Defense Advanced Research Projects

Agency– National Science Foundation– Panasonic AVC Labs– Edison Design Group

6/16/2000 PLDI NCI Tutorial 34

Afternoon Schedule

Time Talk

1:30-2:00 ASDL: Dan Wang

2:00-2:55 Using Zephyr for PL Research: Kevin Scott The VPO Code Generation Interfaces LIRA: The lcc intermediate representation SUIF-to-LIRA

2:55-3:15 Using Zephyr for Architecture Research: Jason Hiser and Chris Milner Introduction Handling a target machine’s calling convention

3:15-3:30 Break

6/16/2000 PLDI NCI Tutorial 35

Afternoon Schedule

Time Talk

3:30-4:30 Using Zephyr for Architecture Research (continued): Jason Hiser and Chris Milner Writing a VPO machine description (md.y) Writing a VPO register specification (regs.rt) EASE: Environment for Architecture Study and Evaluation Case Study: Targeting SimpleScalar

4:30-5:20 Using Zephyr for Optimization Research: Jack Davidson Introduction to VPO’s optimization structure Adding a new optimization to VPO

6/16/2000 PLDI NCI Tutorial 36

Afternoon Schedule

Time Talk

5:20-5:40 Zephyr support tools: Raja Venkateswaran VET: Observing and debugging VPO VPOISO: Isolating optimization errors

5:40-6:00 Wrap up and Open Discussion

Using Zephyr for Programming Language

ResearchKevin Scott

University of Virginia

6/16/2000 PLDI NCI Tutorial 38

Overview

• Zephyr organization and philosophy• VPO code generation interfaces• Adding a new front-end to Zephyr:

– Using the Lira intermediate representation

– With a custom code expander using the VPO code generation interfaces

• Language related issues in retargeting Zephyr

• Q & A

6/16/2000 PLDI NCI Tutorial 39

What is Zephyr?

• Set of tools for generating and optimizing RTL programs– VPO (Very Portable Optimizer)

• SPARC, Alpha, x86, MIPS, SimpleScalar (PISA)

– Code Expanders• Turn a front-end’s IR into RTLs

– Glue for hooking front-ends up to VPO• VPO code generation interfaces• Lira IR

– Debugging tools• VET – interface for controlling and visualizing

VPO transformations• vpoiso – isolates optimizer bugs

6/16/2000 PLDI NCI Tutorial 40

National Compiler Infrastructure

SML/NJ EDG C++ Ada95DEC

FORTRANJava

MachSUIF

SUIF-to-VPOBridge

VPO

lccEDG C++IBM C++

VisualAge

Alpha

SUIF

Sparc MIPS X86Alpha X86

Interproceduralanalysis

Parallelizationand locality

optsObject-oriented

optsScheduling

RegisterAllocation

Instruction selectionRegister allocation

Code motionMemory access

coalescingInduction variable

eliminationCSE

Loop unrollingInlining

SUIFInfrastructure

ZephyrInfrastructure

Optional Item

6/16/2000 PLDI NCI Tutorial 41

Why use Zephyr?

• You’re a language researcher– Easy to hook a front-end up to VPO– Relatively little effort required to get

multiple targets– VPO is a very good optimizer

•Wide range of existing operations•Leverage work of others contributing new

optimizations to VPO– Let’s you concentrate on front-end

issues– Less work than writing a VPO-quality

optimizer yourself

6/16/2000 PLDI NCI Tutorial 42

Front Ends

Zephyr Organization

lccEDG SUIF

SPARC MIPS

Alpha x86

Lira code expanders

VPO

EDG code expanders

SPARC

VPOi and VPOasm

VPCC

SPARC

x86

CVM code expanders

MIPS

6/16/2000 PLDI NCI Tutorial 43

Four Front Ends

• VPCC – A K&R C compiler– IR is code for a C virtual machine (CVM)– Deprecated in favor of lcc front-end

• EDG – Edison Design Group C/C++– Very flexible IR

• Lcc – Retargetable C compiler– Simple backend emits Lira, an IR based on

lcc trees

• SUIF 2.1– High level optimizations and analyses– suif2lira pass transforms SUIF IR into Lira

6/16/2000 PLDI NCI Tutorial 44

Code Expanders

• CVM Code Expanders– SPARC, x86, MIPS– Generate encoded RTL files directly –

don’t use VPOi or VPOasm

• EDG Code Expanders– SPARC– First expander to use VPOi and

VPOasm interfaces

6/16/2000 PLDI NCI Tutorial 45

Lira Code Expanders

• Targets– SPARC– X86– Alpha– MIPS32– MIPS64 and SimpleScalar (PISA)

• Input Lira code specialized for target• Output encoded RTLs for VPO• All use the VPOi and VPOasm

interfaces

6/16/2000 PLDI NCI Tutorial 46

VPOi

• VPOi provides a C interface for:– Creating RTLs– Sending RTLs to VPO for optimization

• Abstracts away specifics of:– RTL representation– How RTLs are sent to VPO

• RTL creation routines can be semi-automatically generated from a machine specification

6/16/2000 PLDI NCI Tutorial 47

VPOasm

• VPOasm provides a C interface for sending assembly language statements to VPO.

• Allows a code expander to:– Change segments– Define symbols– Initialize storage locations– Specify alignments for code or data

6/16/2000 PLDI NCI Tutorial 48

More on VPOi and VPOasm

• Why use these interfaces?– Simpler than writing out VPO encoded RTL

files manually.– Can get some of the implementation for

free if doing a new target architecture.– Allows us to change RTL and assembly

language representations w/o fouling you up. Much.

• Reference manual for VPOi and VPOasm:– http://www.cs.virginia.edu/zephyr/vpoi

6/16/2000 PLDI NCI Tutorial 49

VPOi and VPOasm caveats

• Interfaces are written in C.– Bad if you’re writing a code expander in

languages with no mechanism for calling C functions.

• Interfaces are relatively rigid.– Suppose you want to communicate

something to the optimizer that doesn’t look like an RTL or assembly language.

• Interfaces have only been tested on C/C++ front ends.– Might have to change to accommodate new

language features…

6/16/2000 PLDI NCI Tutorial 50

Lira

• Simple IR based on lcc trees• Targets a stack-oriented virtual

machine• Two types of entities in a Lira file:

– Instructions– Directives

6/16/2000 PLDI NCI Tutorial 51

Lira Instructions

• Instruction is composed of:– Operator (33)

– Type• F (float), I (signed integer), U (unsigned integer),

P (pointer), V (void), B (aggregate)

– Size• 1, 2, 4, 8, …

– Auxiliary info

CALLGEMODADDCVF

ARGEQLSHNEGBCOM

NEASGNDIVINDIRCNST

LABELLTSUBBXORCVUADDRL

JUMPLERSHBORCVPADDRG

RETGTMULBANDCVIADDRF

6/16/2000 PLDI NCI Tutorial 52

Lira Instruction Example

• C Fragmentint a;

a = a + 10;

• Lira Translation

ADDRGP4 “a”

INDIRI4

CNSTI4 10

ADDI4

ADDRGP “a”

ASGNI4

6/16/2000 PLDI NCI Tutorial 53

Lira Directives

• Change program segments with:– code, data, bss, lit

• Specify alignment with:– align

• Control symbol visibility with:– import, export

• Initialize storage locations with:– bytes, string, address, skip

6/16/2000 PLDI NCI Tutorial 54

Lira Directives (cont)

• Indicate procedure boundaries with:– proc, endproc

• Describe procedure locals and parameters with:– local, param

• Describe source coordinates with:– file, line

6/16/2000 PLDI NCI Tutorial 55

Lira Directive Example

• Reserving storage for a global int “a”-bss-export a-align 4+LABELI4 “a”-skip 4

6/16/2000 PLDI NCI Tutorial 56

The truth about Lira

• Lira can be emitted from lcc using a postorder walk of lcc trees. Almost.

• Typical case:ADDI4

INDIRI4

ADDRGP4 “a”

CNSTI4 10

ADDRGP4 “a”

INDIRI4

CNSTI4 10

ADDI4

6/16/2000 PLDI NCI Tutorial 57

The truth about Lira (cont)

• Sometimes, we don’t do a postorder traversal:

ADDI4

INDIRI4

ADDRGP4 “a”

CNSTI4 10

ADDRGP4 “a”

INDIRI4

CNSTI4 10

ADDI4

ADDRGP “a”

ASGNI4

ADDRGP4 “a”

INDIRI4

6/16/2000 PLDI NCI Tutorial 58

The truth about Lira (cont)

• A Lira program is specialized to the compilation target.– Types, sizes and alignments are

target specific– Front-end must generate appropriate

target dependent code for accessing the components of aggregates (arrays and structs)

6/16/2000 PLDI NCI Tutorial 59

Lira Code Expander

• Structured for simplicity.• Code is generated by a big switch

statement.• Two passes made over the input.

– First gather symbol information.– Second generates code.

• SPARC expander is about 1800 lines of C. Close of ½ of the code is machine independent or easily reused on new targets.

6/16/2000 PLDI NCI Tutorial 60

Retargeting Lira code expander

• Three big tasks:– Modify dumptree to map Lira ops

onto RTLs for the new target. Easiest of the three since there is substantial opportunity for cut & paste coding.

– Modify sp_call to emit target dependent RTLs. On the SPARC we emit the following when the caller returns a struct:VPOi_rtl(ST(tmp_loc, sp_plus(r[14], SP_OFS-4)),

VPOi_locSetBuild(tmp_loc, 0));

6/16/2000 PLDI NCI Tutorial 61

Retargeting Lira code expander

• Modify setup_frame to:– Use right offsets for parameters and

locals.– Emit RTLs to do target dependent

frame setup on procedure entry. For procedures returning a struct on the SPARC, we emit:

VPOi_rtl(LD(sp_plus(r[30], SP_OFS-4),tmpreg), 0);

locaddr = sp_plus_ra(r[30], locals.t[0].sym, 0);

VPOi_rtl(ST(tmpreg, Rtl_fetch(locaddr, 32)),

VPOi_locSetBuild(locaddr, tmpreg, 0));

6/16/2000 PLDI NCI Tutorial 62

Why use Lira?

• Lira is a pretty good intermediate language for C-like languages. (Thanks to Chris Fraser and Dave Hanson!)– Abstracts away specifics of a target’s calling

sequence! Left to code expander to implement.

• Separating Lira from lcc means that we can reuse the Lira code expanders for front-ends other than lcc. E.g., SUIF.

• Very easy to write a Lira code expander.

6/16/2000 PLDI NCI Tutorial 63

Lira References

• “A Retargetable C Compiler: Design and Implementation”

• Lcc version 4.1 code generation interfaces– http://www.cs.princeton.edu/software/lcc/pkg/doc/4.

html

• More on the way…

6/16/2000 PLDI NCI Tutorial 64

Adding a front-end to Zephyr

• Is your language C-like? – If yes then consider writing code to

map your IR onto Lira. This gets you all of Lira’s targets almost for free.

– If no then you might need to write a code expander for each target you want to support.

6/16/2000 PLDI NCI Tutorial 65

Adding a front-end to Zephyr

• Is my target already supported?– If yes then you’re golden.– If no then you may have to do one or

more of the following:•Create VPOi and VPOasm interfaces for

your target. This can be partially automated.

•Write a Lira code expander for the new target, or

•Write a custom code expander for the new target.

•Port VPO to the new target.

6/16/2000 PLDI NCI Tutorial 66

Adding a front-end using Lira

• Difficulty depends on your IR.– Trivial for lcc – almost same IR!– Pretty easy for SUIF. E.g.

void Translator::trans(BinaryExpression exp) { int lira_op;

translate(exp->get_source1()); translate(exp->get_source2()); switch(op_map(exp->get_opcode())) {

case SOP_add: lira_op = LIRA_ADD; break;...

} emitter->emit(lira_op, lira_map_ty(exp->get_result_type());}

6/16/2000 PLDI NCI Tutorial 67

Where can I find out more?

• Should be releasing suif2lira as a literate program around July 1.– Good starting point for someone

familiar with SUIF wanting to hook up a front-end with Lira.

• Literate source for SPARC and x86 Lira code expanders will be available immediately after PLDI.

6/16/2000 PLDI NCI Tutorial 68

Adding a front-end using a custom code expander

• Difficulty again depends on your IR.

• Refer to EDG SPARC code expander:– http://www.cs.virginia.edu/zephyr/dist/edg-sparc-1.0.pdf

6/16/2000 PLDI NCI Tutorial 69

Language issues in retargeting Zephyr

• Calling convention– In addition to emitting RTLs to

properly handle language calling conventions on function calls and function entry, also need to consider fixentry in VPO.

– fixentry finalizes a procedure’s prologue after optimization is complete.

– More in next talk.

Using Zephyr for Architecture Research

Jason Hiser and Chris Milner

University of Virginia

A Brief Introduction to Zephyr and Architectural

ResearchJason Hiser

University of Virginia

6/16/2000 PLDI NCI Tutorial 72

Roadmap

• Handling a machine’s calling convention– Jason

• Break– Coffee!

• Writing a VPO machine description and Writing a VPO register description– Chris Milner

• Case Study: Targeting SimpleScalar– Jason

Handling a Machine’s Calling Conventionfixentry fun (regs.c)

Jason HiserUniversity of Virginia

6/16/2000 PLDI NCI Tutorial 74

Introduction To regs.c

• Fixentry: The main routine of regs.c – Responsibilities of fixentry

• Parameters, external and global data used in fixentry

• Other functions: regarg, initmap, map, transfer, leaf

6/16/2000 PLDI NCI Tutorial 75

Responsibilities of Fixentry

• Calculate stack space needed – outgoing parameters, spill locations,

local variables, saved registers, and incoming parameters

• Emit function prologue – Adjust stack pointer– save return address, and saved

registers– add RTLs for local equates

6/16/2000 PLDI NCI Tutorial 76

Fixentry Responsibilities (continued)

• Create and maintain a “mapping” from the registers used to the actual hardware registers

• Save/restore necessary registers and incoming parameters to stack

• Emit function epilogue (including code to restore saved registers)

6/16/2000 PLDI NCI Tutorial 77

Not the responsibility of Fixentry

• Perform any optimization• Insert spill code• Make decisions about register

usability• Emit assembly code for any

instructions• Setup registers/stack for making

a function call• Allocate global data

6/16/2000 PLDI NCI Tutorial 78

Extern Variables (Where fixentry gets its data)

• struct bblock *top List of basic blocks in current function

• struct locuse *locs local variables and parameters

• int isused[MAXREGS] which registers are used and which

aren’t• int varargs is this a variable

argument function?

6/16/2000 PLDI NCI Tutorial 79

Parameters to Fixentry

• struct list *ptr the RTLs in the current function

• struct blist *retb the basic blocks that need epilogue code

6/16/2000 PLDI NCI Tutorial 80

Global Variables

• int gpregmap[] The “mapping” of the general purpose registers

• int fpregmap[] The “mapping” of the float registers

• int spilloff Information to the code emitter

about where to place spill variables

6/16/2000 PLDI NCI Tutorial 81

Calculating Stack Space

• Loop through RTLs and find out how much space is needed for outgoing params

• Loop through temps and calculate spill space needed

• Loop through locals and calculate local space needed

6/16/2000 PLDI NCI Tutorial 82

Calculating Stack Space (cont.)

• Loop through registers and find out which ones need to be saved

• Determine space needed for incoming parameters (register params only)

6/16/2000 PLDI NCI Tutorial 83

Emitting Prologue and Epilogue

• Prologue– Emit code to adjust stack pointer– Emit code to spill return address and

saved regs

• Epilogue– For each exit block

•Restore spilled registers•Restore stack pointer• Jump to return address

6/16/2000 PLDI NCI Tutorial 84

Register Map

• Register allocator determines what variables are in which register– Fixentry needs to put these variables

in the proper register.

• Fixentry attempts to map registers so no movements are necessary, overriding the allocator assignment policy– If it can’t, register to register moves

are necessary

6/16/2000 PLDI NCI Tutorial 85

Other Functions of regs.c

• regarg Boolean function returns true if a local variable is an argument, and enters the

function in a register• initmap Initializes the gpregmap

and fpregmap• map Returns the mapping for a

register

6/16/2000 PLDI NCI Tutorial 86

Other Functions of regs.c(continued)

• transfer Creates a transfer RTL from two machine

locations (memory, register, or spill)

• leaf Boolean function determines if a function is a leaf

6/16/2000 PLDI NCI Tutorial 87

Summary

• Fixentry is the main portion of regs.c

• Fixentry is responsible for – function prologue– function epilogue – register mapping to avoid register to

register moves

• Regs.c also contains a few functions to let other areas know about the mapping.

Using Zephyr for Architecture Research

(continued)

Jason Hiser and Chris Milner

University of Virginia

Writing a VPOMachine Specification

Chris MilnerUniversity of Virginia

6/16/2000 PLDI NCI Tutorial 90

Outline of talk

• Structure of VPO• Machine descriptions• How to construct the descriptions• Getting machine dependent

information for machine independent transformations– combiner– loop (and other) transformations– scheduler

• EASE

6/16/2000 PLDI NCI Tutorial 91

Structure of VPO

C Code

ma

chin

e in

de

pe

nd

en

t so

urc

e

C CodeCSE

C Codestrength

reduction

C Codedead codeelimination

...

C Codesimp.c

Registerdescription

reg.rt

C Codertl.c

machine dependent source

Instructiondescription

md.y

InstructionProcessor

yyfast

C Codesched.c

machineindependent

combiner()

loop_strength()

machinedependent

inst_is_legal()

is_basic()

VPO optimizer

C Code

C Compiler

Pipelinedescription

pipe.pg

RegisterProcessor

regtool

PipelineProcessor(real soon now)

C Code

6/16/2000 PLDI NCI Tutorial 92

VPO

• “Machine independent” transformations on low level “machine dependent” intermediate form (register transfer lists)

• Retargeted portion assists in:– recognizing legal RTLs– converting and inserting RTLs to

assist transformations– picking apart RTLs to get information

6/16/2000 PLDI NCI Tutorial 93

Role of Machine Descriptions

• md.y - legal instructions– maintains VPO invariant– YACC grammars

• regs.rt - register file– register types– alignment– size– ABI

6/16/2000 PLDI NCI Tutorial 94

md.y

• RTL recognizer– Workhorse– RTLs come from combiner (at compile

time)– ours are not usual table driven ones

but directly executable (yyfast)

• How do you do it?– Work from existing ones (derive

Alpha from MIPS); or, – construct one anew

6/16/2000 PLDI NCI Tutorial 95

Sample machine

• Subset SIMPLESCALAR– e.g. student project on FPGA– load/store– chars, half words and words– constants must be loaded into

registers– add, and, not, sll, sra, srl– branch on less than, branch on

equal,jump, call, return

6/16/2000 PLDI NCI Tutorial 96

Constructing md.y (continued)

• Operands - registers%token REG0 REG1 REG2

(scanner converts ‘b’‘[‘‘1’’]’ to REG0)

reg: REG0

| REG1

| REG2

6/16/2000 PLDI NCI Tutorial 97

Constructing md.y (continued)

• Operands - memory%token BMEM WMEM RMEM (scanner converts ‘B’‘[‘ to BMEM )

mem: BMEM reg ‘]’

| WMEM reg ‘]’

| RMEM reg ‘]’

6/16/2000 PLDI NCI Tutorial 98

Constructing md.y (continued)

• Operands - misc%token PC RT ST (used for call and return)

%token LOCAL GLOBAL CON LBL

expr: LOCAL

| GLOBAL

| CON

| LBL

6/16/2000 PLDI NCI Tutorial 99

Constructing md.y (continued)

• Operations%left ‘=‘ ‘+’ ‘&’ ‘”’ ‘{‘ ‘}’

%nonassoc ‘~’ ‘,’

rhs : reg ‘+’ reg

| reg ‘&’ reg

| reg ‘{‘ reg

| reg ‘}’ reg

| reg ‘”’ reg

6/16/2000 PLDI NCI Tutorial 100

Constructing md.y (continued)

• Binary operationsbinops: reg ‘=‘ rhs

• Unary operationnot: reg ‘=‘ ‘~’ rhs

6/16/2000 PLDI NCI Tutorial 101

Constructing md.y (continued)

• Load, load immediate and storel : reg ‘=‘ mem

li: reg ‘=‘ expr

s : mem ‘=‘ reg

si: expr ‘=‘ reg (FORTRAN)

6/16/2000 PLDI NCI Tutorial 102

Constructing md.y (continued)

• Branchbb: PC ‘=‘ reg ‘:’ reg

| PC ‘=‘ reg ‘<‘ reg • jump call and returnjmp: PC ‘=‘ reg

jal: ST ‘=‘ expr

ret: PC ‘=‘ RT

6/16/2000 PLDI NCI Tutorial 103

Constructing md.y (continued)

• All instructionsinst: bb | jmp | jal | ret

| binst | not

| l | li | s

• Now, we need some glue and some checking

6/16/2000 PLDI NCI Tutorial 104

Glue for parser

• Build up semantic records• Found in isem.c

– addr() - record for addressing modereg: REG0 {$$=addr(BYTE,BREGISTER…)}

– memref() - record for memory access– brecord() - record for binary op– rrecord() - record for relational op– same() - ensure records are same

6/16/2000 PLDI NCI Tutorial 105

Semantic routines

• inst.c– each instruction or instruction class

has a routine– routine checks for legal operands– is responsible for emitting legal asm– e.g. bb() -

•on MIPS check the semantics for compare and branch

• right hand operand immediate, use immediate form of instruction

• records instruction type

6/16/2000 PLDI NCI Tutorial 106

Structure of VPO(again)

C Code

ma

chin

e in

de

pe

nd

en

t so

urc

e

C CodeCSE

C Codestrength

reduction

C Codedead codeelimination

...

C Codesimp.c

Registerdescription

reg.rt

C Codertl.c

machine dependent source

Instructiondescription

md.y

InstructionProcessor

yyfast

C Codesched.c

machineindependent

combiner()

loop_strength()

machinedependent

inst_is_legal()

is_basic()

VPO optimizer

C Code

C Compiler

Pipelinedescription

pipe.pg

RegisterProcessor

regtool

PipelineProcessor(real soon now)

C Code

6/16/2000 PLDI NCI Tutorial 107

regs.rt

• TYPES– basic types of registers on the

machine– byte,half,word,float,double– BTREG, WTREG, RTREG, FTREG,

DTREG

• CODES– condition codes – IC,FC,etc.

6/16/2000 PLDI NCI Tutorial 108

regs.rt(continued)

• CLASS – general_purpose, float, spill– number – scratch – reserve

6/16/2000 PLDI NCI Tutorial 109

regs.rt(continued)

• CLASS (continued) – type

•alignment (even-odd register pairs)•size - how many to allocate•invariant - mark as invariant for loops

– e.g. fp and sp•memchar, regchar - give it a different name

•stack, fifo - tells the allocator about them

6/16/2000 PLDI NCI Tutorial 110

regs.rt for MIPS

types BTREG, WTREG, RTREG, FTREG, DTREG

codes FC

class = general_purpose

number = 32

scratch = 2..15, 24, 25

reserve = 0, 1, 26, 27, 28, 29, 31

(notes: MIPS - reg 0 is zero, reg 1 is asm reg,reg 26,27 are used by os, reg 28 is gp,reg 29 is sp, reg 31 is return address)

6/16/2000 PLDI NCI Tutorial 111

regs.rt for MIPS (continued)

type = RTREG

alignment = 1

size = 1

invariant = 28, 29

endtype

type = BTREG, WTREG

alignment = 1

size = 1

endtype

6/16/2000 PLDI NCI Tutorial 112

regs.rt for MIPS (continued)

class = floating_point

number = 16

scratch = 0..9

type = FTREG, DTREG

alignment = 1

size = 1

endtype

endclass

6/16/2000 PLDI NCI Tutorial 113

regs.rt for MIPS (continued)

class = SPILL

number = 32

type = BTREG, WTREG, RTREG, FTREG

alignment = 1

size = 1

endtype

type = DTREG

alignment = 2

size = 2

endtype

endclass

6/16/2000 PLDI NCI Tutorial 114

Structure of VPO(again)

C Code

ma

chin

e in

de

pe

nd

en

t so

urc

e

C CodeCSE

C Codestrength

reduction

C Codedead codeelimination

...

C Codesimp.c

Registerdescription

reg.rt

C Codertl.c

machine dependent source

Instructiondescription

md.y

InstructionProcessor

yyfast

C Codesched.c

machineindependent

combiner()

loop_strength()

machinedependent

inst_is_legal()

is_basic()

VPO optimizer

C Code

C Compiler

Pipelinedescription

pipe.pg

RegisterProcessor

regtool

PipelineProcessor(real soon now)

C Code

6/16/2000 PLDI NCI Tutorial 115

Other files

• simp.c - helps the combiner• sched.c - machine specific

portion of scheduling

• rtl.c - routines to find machine idioms in

transformations

6/16/2000 PLDI NCI Tutorial 116

simp.c

• Combine RTLs in machine dependent way

• e.g. SPARC 1 r[35]=~r[35]

2 {1} r[33]=r[33]&r[35]

combines tor[33]=r[33]&~r[35]

semantically ok, but not an instructioncomp() makes machine idiom substitution

r[33]=r[33] ANDNOT r[35]

6/16/2000 PLDI NCI Tutorial 117

simp.c(continued)

• e.g. SPARC constants 4095 is biggest immediate1 r[40]=4095

2 {1} r[41]=r[40]+13

combines and folds tor[41]=4108

comp() converts to r[41]=HI[4108]

r[41]=r[41]|LO[4108]

6/16/2000 PLDI NCI Tutorial 118

rtl.c

• Manipulate– reverse() - reverse a branch– don’t_bother_with() - tell cse to ignore

• Predicates– is_call(), is_rjmp(), ismem(), writes_mem()

– is_pc(),

• Pick apart– findlabel(), usetype()

6/16/2000 PLDI NCI Tutorial 119

rtl.c(continued)

• Insert code to help transformations– store(), load()– multconst()

•add series of shifts and adds

– locsub() - substitute reg for mem•SPARC has sign extend on load•no single sign extend move•have to insert shifts to do sign extend

6/16/2000 PLDI NCI Tutorial 120

rtl.c(continued)

r[1] = 0

r[9] = r[14] + a

L32:

r[8] = r[1]*4

R[r[8]+r[9]]=0

r[1]=r[1]+1

IC=r[1]?100

PC=IC<0,L32

• regular induction variable• induced expression• basic induction variable

•Assist loop strength reduction•might be one instruction or several

6/16/2000 PLDI NCI Tutorial 121

sched.c

• SPARC - yes, MIPS - no• Scheduler uses mostly machine

independent list scheduling algorithm

• keeps machine specific dependencies straight

• helps avoid hazards

6/16/2000 PLDI NCI Tutorial 122

sched.c(continued)

• md_sets_uses– what an instruction does– what an instruction is blocked by– reads can slide past read, not past

writesrtl->does |= READS

rtl->blocks |= WRITES

– writes cannot slide past anythingrtl->does |= WRITES

rtl->blocks |= WRITES | READS

6/16/2000 PLDI NCI Tutorial 123

sched.c(continued)

• md_sets_uses– condition code users can’t slide past

one another rtl->does |= ICWRITES

rtl->blocks |= ICWRITES | ICREAD

and rtl->does |= ICREADS

rtl->blocks |= ICWRITES | ICREAD

– calls are treated conservatively•assume codes, floats and memory written

6/16/2000 PLDI NCI Tutorial 124

sched.c(continued)

• sched_adv()– relative advantage or disadvantage

of scheduling this instructions next– relative to last instruction scheduled– e.g. SPARC

•space out float instructions•avoid consecutive stores•make consecutive instructions

independent

6/16/2000 PLDI NCI Tutorial 125

EASE

• EASE: Environment for Architecture Study and Experimentation– VPO includes a facility for obtaining

•Measurements of instruction usage• Instruction cache traces•Data cache traces•precise timing

– VPO provides facilities for emulating architectures•Can extend existing architectures

6/16/2000 PLDI NCI Tutorial 126

EASE(continued)

• Use control-flow graph to insert instrumentation code

• Low overhead (10 to 15%)

• Cache traces generated on the fly (no need to store)

Bump Counter

Bump Counter

BasicBlocks

6/16/2000 PLDI NCI Tutorial 127

EASE(continued)

• Emulation of new architecture features– Add new

instructions to machine description

– Generate code and optimize as if new features exist

– In last step of VPO, emit code to emulate new features

r [ 3] = r [ 3] + r [ 2]

r [ 5] = r [ 5] + ( r [ 3] * r [ 2] )

add r2, r3, r3

mul r3, r2, r1add r1, r5, r5

VPOMachLast Step

VPOMachLast Step

Case Study: Targeting SimpleScalar

Jason HiserUniversity of Virginia

6/16/2000 PLDI NCI Tutorial 129

Introduction

• What is SimpleScalar? Why use it?

• Why use VPO with SimpleScalar?– SimpleScalar comes with gcc, why

not use that?

• Experiences in porting VPO to SimpleScalar

• Research with SimpleScalar and VPO

6/16/2000 PLDI NCI Tutorial 130

What is SimpleScalar?

• SimpleScalar is a functional simulator designed for use with architectural research– sim-safe -- a simple, fast simulator– sim-bpred -- measures branch

predictor statistics– sim-cache -- measures cache

statistics– sim-outorder -- models a multi-issue,

out of order superscalar processor

6/16/2000 PLDI NCI Tutorial 131

Why Use SimpleScalar?

• Easy to model many common architectural features.– hybrid branch predictors,arbitrarily many

functional units, much more

• Extendible instruction set -- PISA– Allows any instruction to be “annotated”

•easy to create new instructions or add fields to old ones

• Comes with GNU tools for SimpleScalar– gcc, gas, gld, glibc, etc.

6/16/2000 PLDI NCI Tutorial 132

Why VPO and SimpleScalar?(Why not use gcc?)

• gcc does not generate instruction annotations

• difficult to write new optimizations to take advantage of new instructions

• just building gcc can be a challenge

6/16/2000 PLDI NCI Tutorial 133

Why VPO and SimpleScalar?(continued)

• Easily build VPO on any machine you can build SimpleScalar

• Describe new instructions in machine description and optimizer will automatically use them when beneficial

• New optimizations can consult the machine description to see if architectural support is available– allows portability of optimizations

6/16/2000 PLDI NCI Tutorial 134

Experiences with Porting VPO to SimpleScalar

• PISA is basically MIPS– changes to some instruction formats– dmfc1 appears to be broken, negu not

available, branch if (not) equal to zero instructions don’t exist

• Change instruction format in inst.c• When compiling for SimpleScalar

tell the machine description that negu, beqz, bneqz and dmfc1 are not available

6/16/2000 PLDI NCI Tutorial 135

Research with SimpleScalar and VPO at UVa

• Idea– Compiler managed on-chip memory can

provide performance and power benefits

• Framework– Add instructions to move data to/from

on-chip memory from/to registers• to VPO (in md.y, inst.c)• to SimpleScalar (machine.def)

– Add optimization to promote variables from cache to on-chip memory

6/16/2000 PLDI NCI Tutorial 136

Summary

• SimpleScalar is a versatile functional simulator

• Porting VPO isn’t difficult– SimpleScalar target soon to be

included with VPO

• VPO and SimpleScalar make a great vehicle for architectural research

Using Zephyr for Optimization Research

Jack DavidsonUniversity of Virginia

6/16/2000 PLDI NCI Tutorial 138

VPO Logical Structure

VPOGenerator

Eval. Order Determ.

ZIFLow Analysis &Transformation Libraries

VPOMIPS

CSDLSPARCSpecification

NewTransformation

CSDLMIPSSpecification

CSDLALPHASpecification

CSDLi486Specification

Register Allocation

Access Coalescing

Comm. Subexpr. Elim.

Eval. Order Determ.

Induction Var. Elim.

Instruction Scheduling

Code Motion

SSA Computation

6/16/2000 PLDI NCI Tutorial 139

Actual Structure

VPO

lib SPARC MIPS X86 ALPHA

6/16/2000 PLDI NCI Tutorial 140

VPO Program Representation

TOP

BASIC BLOCK

BASIC BLOCK

i

BASIC BLOCK

i

LIST (RTL struct)

LIST

LIST

RTLCOSTINST TYPEUSESSETSDEF/USE

PREDSIDOMSDOMNEST LVLUSESDEFSOUTSPHIREGSTATE

6/16/2000 PLDI NCI Tutorial 141

VPO Optimizations

• Review vpo.h

6/16/2000 PLDI NCI Tutorial 142

VPO Optimization Algorithm

repeatapply code-improving

transformationuntil fixed-point reached or exhausted registers

• Maintaining two invariants– Semantic invariant (S)

• Observable behavior of program unchanged (according to RTL semantics)

– Machine invariant (M)• Every RTL equivalent to one machine instruction

6/16/2000 PLDI NCI Tutorial 143

VPO code optimization

• Each code-improving transformation is– machine-level, but– machine-independent

• Any semantics-preserving transformation is OK

• Preserve machine invariant (M) using machine description;– for each new RTL produced, ask MD if OK– if any is not target machine instruction,

roll back transformation

6/16/2000 PLDI NCI Tutorial 144

VPO Optimization Driver

• Review vpo.c

6/16/2000 PLDI NCI Tutorial 145

Adding a new optimization

• Determine where in optimize to insert the function– What analyses does the optimization

need?•Control-flow optimizations usually come

first as they need very little data-flow information

•Data-flow optimizations follow: code motion, induction-variable elimination, common subexpression elimination

– Does the optimization operate on a single basic block or does it operate across basic blocks?

6/16/2000 PLDI NCI Tutorial 146

Adding a new optimization

• Browse controlflow.c/fix_control_flow()

• Browse cdmotion.c/code_motion()

6/16/2000 PLDI NCI Tutorial 147

Semantic Safe Points

• A semantic safe point is a point in the optimization process where the code satisfies the M and S invariants– Code can be emitted at any semantic

safe point and it should run correctly– Can insert new optimization between

any semantic semantic-safe point

6/16/2000 PLDI NCI Tutorial 148

Debugging the compiler

SourceCode

Front andMiddle Ends

VPO Mach MachineCode

RTL

Trans n..........Trans 4Trans 3Trans 2Trans 1

6/16/2000 PLDI NCI Tutorial 149

VET-VPO Examination Tool

• Allows transformations to be observed– Observe data structure (control-flow

graph)– Set a break point at a transformation– Set a break point at a phase– Replay a transformation

VET and VPOISO

Raja VenkateswaranUVA

6/16/2000 PLDI NCI Tutorial 151

VET

• VET -> VPO Examination Tool• GUI for viewing optimizations• By Phase and By transformation• Ability to revert to previous

phases• Wide range of user options

6/16/2000 PLDI NCI Tutorial 152

VPOISO

• Tool for isolating optimizer bugs

• Uses binary search to find the first transformation error

• Works by comparing against the correct output

top related