effective compilation support for variable instruction set architecture

1111111111 1

Effective Compilation Support for Variable Instruction Set

Architecture

Jack LiuTimothy Kong

Fred ChowCognigine Corp.

www.cognigine.com

1111111111 2

Outline

1. VISC Architecture

2. Compile-time Configurable Code Generation

3. Managing the Dictionary

4. Concluding Remarks

1111111111 3

Configurable Computing

Motivation• Higher performance

• processor and instruction set customized to

type of application

• Lower hardware cost

• non-essential features excluded

• Shorter time-to-market

1111111111 4

Variable Instruction Set Architecture (VISC ArchitectureTM)

A new approach to configurable computing:

• Fixed processor hardware

• Many types of operations provided

• Numerous instruction variants (CISC-style)

• Per-program instruction set tailoring during compile time

1111111111 5

Background of this work

Cognigine CGN16100 Network Processor• Single-chip, fully programmable network processor

• Processing cores:

16 Re-configurable Communications Units (RCU) processor cores

• VISC architecture• 4 64-bit parallel execution units• Multi-threaded• 512 KB on-chip memory (text and data)

1111111111 6

VISC ArchitectureTM

Dictionary (instruction set for current program)

instruction

dictionary entry:32-bit: 2 operations64-bit: 4 operations128-bit: 8 operations

opcode: 8-bit

opcode opnd0 opnd1 opnd2 opnd3

1111111111 7

Motivation for VISC Architecture

1. Efficient way to encode/decode the many operation variants with different addressing modes

• Not all used in each program

2. High instruction encoding density

• Small opcode bit count

• Operands shared among multiple operations

3. Simplified control logic for VLIW-style ILP

• Up to 8 operations per cycle

1111111111 8

Operation Specification

In Dictionary Entry (only specified once):1. Operation name2. Operation variants:

• Signed and unsigned• Operand and result sizes — 8-bit, 16-bit, 32-bit, 64-bit

• Support different sizes among operand(s) or result• Vector — 64v8, 64v16, 64v32, 32v8, 32v16

3. Data path to each operand/result

In Instruction:1. Operands’ encoding formats2. Actual operands

1111111111 9

RCU Architecture• 5 Stage Pipeline• 4-way multi-threaded• Hardware RSF synchronization

• 128 bit reconfigurable address path• 256 bit reconfigurable data path

ExecutionUnit

PointerFile Dictionary

Registers, Scratch Memory

Packet Buffers DataMemory

InstructionCache

RSF Connector

ExecutionUnit

SourceRoute

eline &

64 64 64

n“Back-side” Ports

128 128 64

1111111111 10

Roles of Compiler for VISC Architecture

1. Determine best instruction set stored in dictionary for best execution time performance

2. Generate optimized code sequence based on best instruction set

3. Cater to various hardware limitations:

• Dictionary limit

• Data path constraints

• Dictionary and Instruction encoding constraints

1111111111 11

New Compilation Approach: Configurable Code Generation

• Exact form of generated instructions decided in the last instruction scheduling phase

• Direct result of instruction compaction based on what is allowed by the hardware

1111111111 12

Compiler Implementation Method

• Retarget SGI Pro64 (Open64) compiler to an Abstract Machine

• Code generator operates on an Abstract Operation Representation– Code generation optimizations left intact

• Add new Instruction and Dictionary Finalization (IDF) phase as post-passIDF Phase 1:– Instruction scheduling and folding– Abstract operations converted to target code sequence

IDF Phase 2:– Output VISC instructions and dictionary entries

1111111111 13

Compiler Phase Structure

GNU / Pro64TM Front-end

WHIRL Optimizer

Code Generator

Pro64TM Back-end

Assembly Program: Instructions Dictionary

1111111111 14

Abstract Operation Representation (AOR)

Each operation corresponds to a micro-operation in the core execution units

• RISC-like formats– r1 = op r2, r3– r2 = load <offset>(<base>)– store r2 <offset>(<base>)– r1 = loadimm <imm>

• Optimizations in AOR reflected in final code• No pre-disposition of compiler to any specific

instruction format

1111111111 15

Multiple AOR ops can be combined to single target operation

Operations taking immediate operandr2 = move <imm> => r3 = addi r1 <imm>r3 = add r1, r2

Operations supporting memory operandsr2 = load 4(sp) => r3 = add r1 4(sp)r3 = add r1, r2

Post incre/decre memory operationsr2 = load 0(r1) => r2 = load 0(r1++)r1 = addi r1, 4

Branches on condition codesr1 = add r2, r3. . . r1 = add r2, r3compare (r1 != 0) => br.z label (only if immediately after)br.z label

Others

1111111111 16

IDF Approach

Instruction scheduling + following tasks:– Instruction folding– Opcode selection– Modelling of irregular hardware constraints– Modelling of encoding constraints– Monitoring of states of condition codes and

transient registers– Keeping track of dictionary contents

Use enumeration (branch and bound) approach

1111111111 17

Example of IDF Processing

$w80 = move 0x55$w91 = move 0xf8$w70 = add $w70, $w80$w71 = xor $w92, $w80$w90 = sub $w92, $w91store 8($p1) = $w90

Dictionary

add xor sub nop

instruction

• move and store instructions subsumed• $w71, $w92 mapped to transient registers

3 add xor sub nop

op3 8($p1) $w70 0x55 0xf8

1111111111 18

IDF Scheduling Algorithm

To speed up the search:

Shrink solution space by:– Coming up with high

initial boundsch

– Prune useless search paths continuously

• Tight hardware constraints help

Estimate initial boundsch

Search for schedule with length <=

boundsch

succeed?

Input: Sequence of operations in BB

boundsch= boundsch+1

1111111111 19

Managing the Dictionary

• Dictionary usage increases due to:– Program size: more variety of operations– High ILP: more combination of operations– Library code linked in

• Currently, dictionary contents fixed for each executable• Role of linker:

– Merge dictionary entries with identical contents across files/libraries

– Error message on dictionary overflow• Role of compiler:

– Maximize dictionary entry re-use

1111111111 20

Dictionary Compilation

Strategy:• Keep track of existing dictionary entries during compilation

– Extract dictionary entries from:• Libraries and .s files being linked• .o files compiled before current file

Example: cc a.c b.o c.s– Maintain table of existing dictionary entries– Add to table as new entries are generated

• Re-use existing dictionary entries • Bias scheduling towards dictionary conservation as

dictionary fills up

1111111111 21

User Control of Dictionary CompilationBest program performance demands near-full

dictionary.When dictionary overflow, needs to re-compile.Provide user control mechanisms:

– Trade-off between dictionary consumption and program performance

– Command line option: -CG:dict_usage=n n = 0…10– Embedded in code: #pragma dict_usage n

dict_usage is dictionary budget guideline for IDF– Low dict_usage:

• Less new dictionary entries created• Low ILP

– High dict_usage: • Tighter instruction schedule• More dictionary entries created

1111111111 22

Additional search goal bounddict

– Number of new dictionary entries allowed for current BB– Automatically adjust lower with more pre-existing entries

When bounddict

reached during enumeration, disallow creating new dictionary entry (unless single operation)

IDF Support of dict_usage

10 8 3 2 0

dict_usage

instructions

dict entries

1111111111 23

Experimental Results

Summary (with dict_usage=10):• ILP from IDF scheduling: 1.38 ops per instruction• ILP from relaxed scheduling: 1.51 ops per instruction• 23% of all subsumable operations subsumed• Each dictionary entry referred to by 2.63 instructions

(statically)• Scheduling via enumeration: 100 times slower than

one-pass schedulers• Compilation time: 1 to 2 minutes per program

1111111111 24

Concluding Remarks• VISC approach most suitable as embedded processors

– Limited program size– Dictionary space less of an issue– Slow compilation tolerable– CISC-style instructions enable small code size

• Compilation support key to deploying applications on VISC– Very hard to write in assembly language– Advanced optimizations performed by compiler– Dictionary managed by compiler with user hints

• Compile-time configurable code generation enables RISC compilation techniques to generate CISC output

effective compilation support for variable instruction set architecture

Documents

dagstuhl seminar on instruction-level parallelism and...

operating instruction oi/vm3d-en rev c vm3d 3d …...

instruction manual - taco-hvac with a danfoss variable...

alpha® variable speed wet polisher instruction manual -...

mitsubishi d700 variable frequency drive instruction manual

advanced placement biology prelab instruction a compilation...

variable speed scroll saw variable speed ……variable...

instruction manual manual de instrucciones manual de...

dw304p sierra sable velocidad variable serra sabre de...

instruction manual variable area flow meter q-flow

gb variable speed drives instruction manual - lovato...

computer architecture anc instruction set...

customization using variable instruction sets

pengaruh model pembelajaran science … · group used...

instruction manual - emerson · instruction manual and per...

reading instruction - university of · pdf fileour portfolio...

c programming separate compilation variable lifetime and...

new methods for the compilation of food ...it is the link...

formula for learning at bloomington south€¦ · 1 formula...

microprocessor architecture and instruction...