bruno cardoso lopes [email protected] llvm developers

38
Object Code Emission & llvm-mc Bruno Cardoso Lopes [email protected] LLVM Developers Meeting, 2009 Cupertino, CA

Upload: trinhliem

Post on 01-Jan-2017

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

Object Code Emission& llvm-mc

Bruno Cardoso [email protected]

LLVM Developers Meeting, 2009Cupertino, CA

Page 2: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

Introduction

• Motivation

• Background

• Actual Code Emission

• Object Code Emission

• llvm-mc

Page 3: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

Motivation

Compiler

• Known path

• Object code path

.oAssembler Linker

.s

Compiler.o

Linker

Page 4: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

Motivation

• Why direct object code emission?

• Bypass the external assembler.

• Speed-up compile time.

Page 5: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

Background

• Current code emission:

• Asm printers

• JIT engine

Page 6: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

Asm Printer

• AsmPrinter

• Instructions are described on .td files.

• Auto-generated method is used to print instructions.

Page 7: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

Asm Printer

void X86AsmPrinter::printMCInst(const MCInst *MI) { if (MAI->getAssemblerDialect() == 0) X86ATTInstPrinter(O, *MAI).printInstruction(MI); else X86IntelInstPrinter(O, *MAI).printInstruction(MI);}

Auto-generated method

Page 8: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

JIT

• JIT emits binary code.

• Blobs are emitted to memory by a target specific code emitter class.

Page 9: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

JIT

• The code is emitted per-function

X86 Emitter

PPC Emitter

MachineFunctionPass

...

Page 10: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

JIT

• Only PPC has a auto-generated code emitter.

...class Emitter : public MachineFunctionPass { ... bool runOnMachineFunction(MachineFunction &MF);

void emitInstruction(const MachineInstr &MI, const TargetInstrDesc *Desc);...

Page 11: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

MachineCodeEmitter

• The actual binary code emission is done by calls to the MachineCodeEmitter.

void ...emitInstruction(const MachineInstr &MI, ...) {

// Emit the lock opcode prefix as needed. if (Desc->TSFlags & X86II::LOCK) MCE.emitByte(0xF0); ...

MachineCodeEmitter

Page 12: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

JITCodeEmitter• JIT code emission is implemented in the

JITCodeEmitter.

• A specialization from MCE.

• Implement methods to actually write to memory:

emitByte(..)emitULEB128Bytes(..)emitDWordLE(...)emitAlignment(...)

Page 13: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

Object Code

• Object Code support is implemented using this scenario.

• Specialize the MCE as JIT does.

• MCE is an instance of ObjectCodeEmitter.

Page 14: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

Object Code

• The specific formats (e.g. ELF) are specializations of ObjectCodeEmitter.

ELFCodeEmitter

COFFCodeEmitter

ObjectCodeEmitter

...

Page 15: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

Object Code

• Blobs of code and data are written to BinaryObjects.

• High level abstraction of "Sections" or "Segments".

class ELFSection : public BinaryObject { public: ...

Page 16: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

ELFCodeEmitter• Handling of ConstantPools and Jumptables.

• On each binary format a different section.

• Generic target relocations to ELF specific ones.

llvm::reloc_absolute_word

R_X86_64_32

Page 17: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

ELFCodeEmitter

• The ELFCodeEmitter emits code to BinaryObjects.

llvm MCE.emitByte(0xF0)

ELFCodeEmitter::emitByte(..)

BinaryObject::emitByte(..).text

Page 18: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

ELFWriter

• Emits the symbol table, string table, header and relocations into binary objects.

• Dump binary objects to a final file.

Page 19: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

Limitations

• Inline assembly not handled by emitters.

• That demands an assembly parser.

• Solution: llvm-mc.

Page 20: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

llvm-mc

• Machine code driver.

• Current playground for an assembly parser, assembler and disassembler.

Page 21: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

llvm-mc

• Goals:

• Extract all info from .td files.

• Auto-generate a assembler, disassembler and code generator.

• Integrate the assembler into the compiler.

Page 22: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

llvm-mc

• Goals:

• At least ~20% speedup at “-O0 -g”.

• Share binary writers code base as much as possible among different formats.

Page 23: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

llvm-mc

• In progress:

• Parse a assembly file and dump the Lex tokens.

.data

.ascii "hello"

identifier: .dataEndOfStatementidentifier: .asciistring: "hello"EndOfStatement

-as-lex

Page 24: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

llvm-mc• In progress:

• Parse and assemble a .s file, emitting asm again or object code.

$ llvm-mc -assemble -output-asm-variant=0 -show-encoding x86.s

.section __TEXT,__text,regular,pure_instructions subb %al, %al # encoding: [0x28,0xc0] addl $24, %eax # encoding: [0x83,0xc0,0x18]

Page 25: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

llvm-mc

• In progress:

• A complete assembler: includes relaxation phases, which allows late optimizations

• Example: Jump instruction encoding on x86.

Page 26: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

llvm-mc

• In progress:

• Interactive disassembler: makes easier to write regression tests for instruction encoding

$ llvm-mc -disassemble

74 22

1 instruction:74 22 je 34

user input

Page 27: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

llvm-mc

• Architecture

• The asm parser emits code through a generic streamer, MCStreamer.

• The streamer is specialized to emit asm or object code.

Page 28: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

llvm-mc

MCAsmStreamer

MCMachOStreamer

MCStreamer

EmitInstruction()

EncodeInstruction()

MachObjectWriter()

AsmParser.Run()

WriteObject()

.o

.s

Page 29: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

llvm-mc

• Current limitations

• Quite new and experimental.

• Demands lots of clean up and refactoring.

• Hardcoded for MachO.

Page 30: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

llvm-mc

• Current limitations

• ELF emission is not integrated into the llvm-mc architecture.

• ELF assembly parsing bits not implemented.

• The Assembly printer is not entirely converted to use MCAsmStreamer.

Page 31: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

llvm-mc

• MCStreamer future:

• Support other binary formats.

• New specializations for: JIT, dwarf EH and debug info.

Page 32: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

llvm-mc

• MCStreamer future:

• JIT and asm printers will eventually be merged into only one “emitter”.

• “-S” could generate “verbose assembly” by default (loop depth, encoding info, ...)

Page 33: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

Future Design

• Printing .s

MCAsmStreamer

MCStreamer

CodeGen Code Emitter

.s

Page 34: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

• JIT

MCJITStreamer

MCStreamer

CodeGen Code Emitter

Memory

Future Design

Page 35: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

• .o writing

MCObjectStreamer

MCStreamer

CodeGen Code Emitter

.oAssemblerBackend

ELF, MachO, ...

Future Design

Page 36: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

• Inline asm for .o file writing

MCObjectStreamer

CodeGen Code Emitter

.o

...AsmParser

MCStreamer

ELF, MachO, ...

AssemblerBackend

Future Design

Page 37: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

MCMachOWriter

Relaxation

Layout

.o

MCELFWriter ...

...

Future DesignAssemblerBackend

Page 38: Bruno Cardoso Lopes bruno.cardoso@gmail.com LLVM Developers

Object Code Emission& llvm-mc

Bruno Cardoso [email protected]

Questions?