![Page 1: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/1.jpg)
Building An Efficient JITthe best kind of jit
Nate BegemanApple Inc
August 1st, 2008
![Page 2: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/2.jpg)
Overview
• Resources
• JIT Basics
• Clang-based JIT
• An Efficient Clang-based JIT
![Page 3: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/3.jpg)
Resources
Memory, Cycles, Etc.How much are we using?
![Page 4: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/4.jpg)
CPU
CPU TOTAL RPRVT
myjit 50ms 100ms 41928K
![Page 5: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/5.jpg)
TOTAL
CPU TOTAL RPRVT
myjit 50ms 100ms 41928K
![Page 6: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/6.jpg)
RPRVT
CPU TOTAL RPRVT
myjit 50ms 100ms 41928K
![Page 7: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/7.jpg)
JIT Basics
JIT 101Common JIT Tasks
![Page 8: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/8.jpg)
What does a JIT do?
• Create IR
• Loading Libraries
• Link Modules
• Optimization & Transforms
• Codegen
![Page 9: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/9.jpg)
Loading Libraries
#include “llvm/Bitcode/ReaderWriter.h”
// ParseBitcodeFile - Read the specified bitcode file// returning the Module.Module *ParseBitcodeFile(MemoryBuffer *buffer, ...);
#include “llvm/System/DynamicLibrary.h”
// LoadLibraryPermanently - Load the dynamic library at // path. It will be unloaded when the program terminates.bool LoadLibraryPermanently(const char *path, ...);
![Page 10: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/10.jpg)
Generating IR
#include “llvm/Module.h”
// No default constructor, must provide name.Module(const std::string &ModuleID);
You probably also want to...
#include “llvm/Module.h”
/// Set the data layoutvoid setDataLayout(const std::string& DL);/// Set the target triple.void setTargetTriple(const std::string &T);
![Page 11: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/11.jpg)
In Your Module...
• Functions
- Declarations
- Definitions
• Globals
• Annotations
![Page 12: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/12.jpg)
Linking
• What?
• Why?
![Page 13: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/13.jpg)
Linking
#include “llvm/Linker.h”
/// LinkModules - The Src module is linked into the Dst module/// such that types, global variables, functions, etc. are/// matched and resolved.static bool LinkModules(Module* Dst, Module* Src, ...);
Destroys Src...but not its memory!
![Page 14: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/14.jpg)
Opts & Transforms
• No one correct level of optimization
• Create your own PassManager(s)
![Page 15: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/15.jpg)
Opts & Transforms
#include “llvm/Transforms/Scalar.h”#include “llvm/Transforms/IPO.h”
std::vector<const char *> exportList;
PassManager Passes;Passes.add(new TargetData(M));Passes.add(createInternalizePass(exportList)); Passes.add(createScalarReplAggregatesPass()); Passes.add(createInstructionCombiningPass());Passes.add(createGlobalOptimizerPass());Passes.add(createFunctionInliningPass());
All these passes and more are yours for one low price!
![Page 16: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/16.jpg)
Inlining
Constant *Fn = M->getFunction(“inlineMe”);
for (Value::use_iterator ui = Fn->use_begin(), ue = Fn->use_end(); ui != ue; ) { if (CallInst *CI = dyn_cast<CallInst>(*ui++)) InlineFunction(CI);}
#include “llvm/Transforms/Utils/Cloning.h”
/// InlineFunction - Performs one level of inlining.bool InlineFunction(CallInst *C)
![Page 17: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/17.jpg)
Code Generation
• Get a function pointer from JIT
• Release when finished.#include “llvm/ExecutionEngine/ExecutionEngine.h”
/// getPointerToFunction - This returns the address of the/// specified function,compiling it if necessary.void *getPointerToFunction(Function *F);
/// freeMachineCodeForFunction - deallocate memory used to/// code-generate this Function.void freeMachineCodeForFunction(Function *F);
![Page 18: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/18.jpg)
Practical Example
JIT 102A Clang-based JIT
![Page 19: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/19.jpg)
Assembling the JIT
libclangC99
/// c_to_module - entry point into libclang, our shared library/// that uses clang to turn a C string into an LLVM IR Module.extern "C" Module *c_to_module(const char *source, char **log);
int main(int argc, char **argv) {...const char *source = getFile(path);...
for(;;) { // Create a new module from the source string Module *M = c_to_module(source, 0);
![Page 20: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/20.jpg)
Assembling the JIT
libclangC99
parse bitcode
#include “llvm/Support/MemoryBuffer.h”
// Load my library of fancy extensions to Libc MemoryBuffer *buffer = MemoryBuffer::getFile("mylibc.bc"); Module *Library = ParseBitcodeFile(buffer);
![Page 21: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/21.jpg)
Assembling the JIT
libclangC99
parse bitcode
Linker
// Link the modules together so that we can do inlining. // After this step, 'M' will contain all the bitcode. Linker::LinkModules(M, Library, 0 /* error string */);
// Don’t forget to delete Library, otherwise we’ll leak// memory.
delete Library;
![Page 22: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/22.jpg)
Assembling the JIT
Optimizer
// Register some passes with a PassManager, and run them. PassManager Passes; Passes.add(new TargetData(M)); Passes.add(createInternalizePass(exportList)); Passes.add(createGlobalDCEPass()); Passes.add(createGlobalOptimizerPass()); Passes.add(createScalarReplAggregatesPass()); Passes.add(createInstructionCombiningPass()); Passes.run(*M);
libclangC99
parse bitcode
Linker
![Page 23: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/23.jpg)
Assembling the JITCode
Generator
// Create the JIT ExistingModuleProvider *EMP; EMP = new ExistingModuleProvider(M); ExecutionEngine *JIT = ExecutionEngine::create(EMP); Function *F = M->getFunction("myCosine");
// Cast function pointer to correct type and call. jitfn fn = (jitfn)JIT->getPointerToFunction(F); printf("foo! %d\n", fn(5));
OptimizerlibclangC99
parse bitcode
Linker
![Page 24: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/24.jpg)
Success!
...but at what price?
CodeGeneratorOptimizerlibclangC99
parse bitcode
Linker
![Page 25: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/25.jpg)
Demo
![Page 26: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/26.jpg)
JIT 102 Results
Slow compile times!
CPU TOTAL RPRVT
jit102 600ms 800ms 31500K
![Page 27: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/27.jpg)
JIT 102 Results
CPU TOTAL RPRVT
jit102 600ms 800ms 31500K
Too much memory!
![Page 28: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/28.jpg)
Efficient JITing
JIT 201Don’t do what we just taught you in 101!
![Page 29: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/29.jpg)
What does a JIT do?
• Loading Libraries
• Create IR
• Link Modules
• Optimization & Transforms
• Codegen
![Page 30: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/30.jpg)
Areas To Optimize
• Loading Libraries
• Create IR
• Link Modules
• Optimization & Transforms
• Codegen
![Page 31: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/31.jpg)
Loading Libraries
#include “llvm/Bitcode/ReaderWriter.h”
/// getBitcodeModuleProvider - Read the header of the specified/// bitcode buffer and prepare for lazy deserialization of/// function bodies.ModuleProvider *getBitcodeModuleProvider(MemoryBuffer *Buffer);
#include “llvm/Bitcode/ReaderWriter.h”
/// materializeFunction - make sure the given function is fully/// read.bool materializeFunction(Function *F);
![Page 32: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/32.jpg)
Loading Libraries
• What if I don’t need to inline?
• Compile your runtime library to a dynamic library!
![Page 33: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/33.jpg)
Loading Libraries // Load my library of fancy extensions to Libc MemoryBuffer *buffer = MemoryBuffer::getFile("mylibc.bc"); Module *Library = ParseBitcodeFile(buffer);
// Create the runtime library module provider, which will // lazily stream functions out of the module. MemoryBuffer *buffer = MemoryBuffer::getFile("mylibc.bc"); ModuleProvider *LMP = getBitcodeModuleProvider(buffer); Module *LM = LMP->getModule();
![Page 34: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/34.jpg)
Linking
Module *Library = Provider->theModule(); // 300KB + 20MBModule *MyFunc = compileWithClang(someSource); // 50KB
// A common mistake, links Library into MyFunc.// Materializes & copies 20MB of IR into 50KB module => 40MB!!LinkModules(MyFunc, Library);
// Much better, link MyFunc into Library, and delete contents// later. 50KB into 300KB => 350KB. 100x improvement.LinkModules(Library, MyFunc);
![Page 35: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/35.jpg)
Linking
• What is “GhostLinkage” ?
• Upcoming Improvements.
![Page 36: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/36.jpg)
Opts & Transforms
• Run Internalize, DCE, and instcombine on clang-generated code.
• Just run IPO opts after link.
![Page 37: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/37.jpg)
Demo & Bakeoff!
![Page 38: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/38.jpg)
Final Results
20x faster compile times!
CPU TOTAL RPRVT
jit201 30ms 40ms 3900K
![Page 39: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/39.jpg)
Final Results
CPU TOTAL RPRVT
jit201 30ms 40ms 3900K
7x less memory!
![Page 40: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/40.jpg)
Documentation
• llvm / [email protected]
• clang / [email protected]
• web archives of lists available
![Page 41: Building An Efficient JIT - LLVM · Final Results 20x faster compile times! CPU TOTAL RPRVT jit201 30ms 40ms 3900K. Final Results CPU TOTAL RPRVT jit201 30ms 40ms 3900K 7x less memory!](https://reader034.vdocuments.mx/reader034/viewer/2022051812/602cc1531e3df47f367fc7e2/html5/thumbnails/41.jpg)
Questions&
Answers