optimizing builds - llvm
TRANSCRIPT
OPTIMIZING BUILDSON WINDOWS
SOME PRACTICAL CONSIDERATIONS
Alexandre Ganea, [email protected]
2019 Bay Area LLVM Developers' Meeting, Oct.22-231
5PART 1 – PREAMBLE
Editor Build 20,000 .CPP25,000 .H
23 GB .OBJ9 GB .DEBUG$T
10 M TYPE RECORDS42 M SYMBOLS
300 M .EXE2 GB .PDB
Windows 10Fastbuild, distributedAlways Unity builds
Concurent AAA games 20 – 25LoC/game 30 - 50 M
Programmers/title 100 – 250Code Changes/day 100 – 150
(peak:400)
Build targets/platform 5 – 6Platforms/Game 4+
Code workspace 70 - 100 GBData workspace 100 - 200 GB
Game builds/day 100 – 150Stripped Build 1 - 6 GBFinal Build 50 - 90 GB
Game production constraints @ Ubisoft
6PART 1 – PREAMBLE
08 min 50 sec
08 min 33 sec
08 min 50 sec
08 min 20 sec
07 min 00 sec
04 min 00 sec
04 min 15 sec
10 min 20 sec
06 min 46 sec
01 min 18 sec
43 sec
29 sec
29 sec
29 sec
00 min 00 sec 02 min 53 sec 05 min 46 sec 08 min 38 sec 11 min 31 sec 14 min 24 sec 17 min 17 sec 20 min 10 sec
2017 (MSVC)
2018 (MSVC)
Fall 2018 (MSVC + LLD)
2019 (MSVC + LLD)
2019 (Clang)
100% cache hit, local SSD
100% cache hit, 1 Gpbs network
AAA GAME, CLEAN REBUILDX64 EDITOR RELEASE (FASTBUILD)
Compiler Linker
9
clang-cl /E md5sum curl https://store/ clang-cl
clang-scan-deps
while read x; do
md5sum $x;
done
deps.txt
a.cpp
a.cpp
found
not found
5-10 sec
0.02 sec 0.02 sec
FASTBUILD CACHE READ ALGORITHM
PART 2 – EXPERIMENTS
deps+MD5.txt
10
06 min 10 sec
04 min 05 sec
35 sec
40 sec
40 sec
40 sec
VS2017 15.9.16
Network cache
Network cache + clang-scan-deps
100% NETWORK CACHE HITSAAA GAME, X64 EDITOR RELEASE (FASTBUILD)
Compiler/Cache Linker
PART 2 – EXPERIMENTS
11
clang-scan-deps+ network cache
LLD(MSVC OBJs + ghash)
Intel Xeon W-2135 @ 3.7 GHz, 128 GB, NVMe SSD, 1Gbps Network
7 GB –> 22.6 GB50k files
PART 2 – EXPERIMENTS
(ms)
Title of Document 13
11.5% process time
CLANG-SCAN-DEPS STANDALONE(50K FILES)
avg ~90% cpu
Intel Xeon W-2135 @ 3.7 GHz (6-core), 128 GB, NVMe SSD
16PART 2 – EXPERIMENTS
sizeof(std::error_code) -> 16 bytes
sizeof(llvm::ErrorOr<DirectoryEntry&>) -> 24 bytes
sizeof(llvm::StringMapEntry<llvm::ErrorOr<DirectoryEntry&>>) –> 32 bytes (+string contents)
DOWN THE RABBIT HOLE
17
nullptr
nullptr
nullptr
0x15f238a92
nullptr
nullptr
nullptr
NumBuckets 0
0
0
0x12345678
0
0
0
uint32_t
StringMapEntry*
NumBuckets
count
value
string
size_t
T
count
PART 2 – EXPERIMENTS
STRINGMAP: MEMORY LAYOUT
19
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
1 5 9 13 17 21 25 29 33 37 41 45 49
60
.2%
14.7
%8
.0%
5.3
%3
.5%
1.7%
0.8
%0
.5%
0.2
%0
.1%0
.1%
Hash collisions / call
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
1 5 9 13 17 21 25 29 33 37 41 45
79.4
%11
.7%
3.5
%1.6
%1.0
%0
.5%
0.3
%0
.2%
0.1%
0.1%
Cachelines hit / call
187 M samplesPART 2 – EXPERIMENTS
STRINGMAP STATS
24
LINK AAA GAME, X64 EDITOR RELEASE(22.8GB MSVC OBJS)
VS2019 16.2
LLD 9.0
LLD 8 + // GHASH
Intel Xeon W-2135 @ 3.7 GHz (6-core), 128 GB, NVMe SSD
PART 2 – EXPERIMENTS
58 sec
62 sec
49 sec
25
19.21 s
7.42 s
5.29 s
4.20 s
.0 s 5.0 s 10.0 s 15.0 s 20.0 s 25.0 s
Clang 9.0, no Ghash
Clang 8.0 + // Ghash (12-byte buckets)
Clang 8.0 + // Ghash (8-byte buckets)
Clang 8.0 + // Ghash (8-byte buckets) + 2MB pages
GHash
uint64_t
TypeIndex
uint32_t
GHash
uint64_t
TypeIndex
PART 2 – EXPERIMENTS
29
int main(int argc_, const char **argv_) {noteBottomOfStack();llvm::InitLLVM X(argc_, argv_);SmallVector<const char *, 256> argv(argv_, argv_ + argc_);
if (llvm::sys::Process::FixupStandardFileDescriptors())return 1;
llvm::InitializeAllTargets();return ClangDriverMain(argv);
}
int ClangDriverMain(SmallVectorImpl<const char *>& argv) {static LLVM_THREAD_LOCAL bool EnterPE = true;if (EnterPE) {llvm::sys::DynamicLibrary::AddSymbol("ClangDriverMain", (void*)(i..EnterPE = false;
} else {llvm::cl::ResetAllOptionOccurrences();
}
auto TargetAndMode = ToolChain::getTargetAndModeFromProgramName(arg..
clang/tools/driver/driver.cpp int Command::Execute(ArrayRef<llvm::Optional<StringRef>> Redirects,std::string *ErrMsg, bool *ExecutionFailed) const {
[...]typedef int (*ClangDriverMainFunc)(SmallVectorImpl<const char *> &);ClangDriverMainFunc ClangDriverMain = nullptr;
[...]if (ClangDriverMain) {[...]llvm::CrashRecoveryContext CRC;CRC.EnableExceptionHandler = true;
const void *PrettyState = llvm::SavePrettyStackState();
int Ret = 0;auto ExecuteClangMain = [&]() { Ret = ClangDriverMain(Argv); };
if (!CRC.RunSafely(ExecuteClangMain)) {llvm::RestorePrettyStackState(PrettyState);return CRC.RetCode;
}return Ret;
} else {auto Args = llvm::toStringRefArray(Argv.data());return llvm::sys::ExecuteAndWait(Executable, Args, Env, Redirects,
/*secondsToWait*/ 0,/*memoryLimit*/ 0, ErrMsg,ExecutionFailed);
}}
clang/lib/driver/Job.cpp
PART 2 – EXPERIMENTS
MAKING CC1 REENTRANT
31
34 min 00 sec
28 min 00 sec
12 min 00 sec
32 min 30 sec
30 min 16 sec
13 min 10 sec
22 min 46 sec
19 min 54 sec
07 min 10 sec
6-core - W10 build 1803
6-core - W10 build 1903
36-core - W10 build 1709
BYPASSING THE CC1 PROCESSCLEAN REBUILD LLVM, CLANG & LLD
VS2019 16.2 Clang 9.0 Clang 9.0 + cc1 bypass
PART 2 – EXPERIMENTS
35
$ LD_PRELOAD=/path/to/my/malloc.so /bin/ls
#include "rpmalloc/rpmalloc.c"
extern "C" {_ACRTIMP _CRTRESTRICT void *malloc(size_t size) {
return rpmalloc(size);}
_ACRTIMP void free(void *p) { rpfree(p); }
_ACRTIMP _CRTRESTRICT void *calloc(size_t n, size_t elem_size) {return rpcalloc(n, elem_size);
}_ACRTIMP _CRTRESTRICT void *realloc(void *ptr, size_t size) {
return rprealloc(ptr, size);}}
// Bypass CRT debug allocator#ifdef _DEBUGvoid *operator new(decltype(sizeof(0)) n) noexcept(false) { return malloc(n); }void __CRTDECL operator delete(void *const block) noexcept { free(block); }
void *operator new[](std::size_t s) throw(std::bad_alloc) { return malloc(s); }void operator delete[](void *p) throw() { free(p); }#endif
https://github.com/mjansson/rpmalloc
llvm/lib/Support/Windows/Memory.inc
PART 2 – EXPERIMENTS
REPLACING THE CRT ALLOCATOR
36
57 min 00 sec
20 min 13 sec
16 min 19 sec
37 min 12 sec
> 1 h 30 min
03 min 57 sec
VS 2017 15.9.16
Clang 9.0 ThinLTO
Clang 9.0 ThinLTO + rpmalloc
THINLTO (CLEAN REBUILD)RAINBOW 6: SIEGE, PC GAME PROFILE
6-core (W10 build 1903) 36-core (W10 build 1709)
PART 2 – EXPERIMENTS
PART 3 – PROPOSAL 38
FASTBUILD
clang.exe
lld-link.exe
llvm-tblgen.exe
clang-tblgen.exe
llvm-lib.exe
ml64.exe (masm)
rc.exe
cmake.exe
PREVIOUS BUILD PROCESS
PART 3 – PROPOSAL 41
FASTBUILD
ml64.exe (masm)
rc.exe
cmake.exe
LLVM-BUILDOZER
clang.exe
lld-link.exe
llvm-tblgen.exe
clang-tblgen.exe
llvm-lib.exe
BUILDING WITH BUILDOZER
PART 3 – PROPOSAL 42
FASTBUILD
ml64.exe (masm)
rc.exe
cmake.exe
Worker 1 Worker 2 Worker 3 Worker 4 Worker 5
Local
Local Local
43
int buildozer::ImportEXE(llvm::StringRef EXE) {[..]HINSTANCE H = LoadLibraryA(EXE.data());if (!H)return 0;
RemapIAT(H);InitDebInfo();PatchRPMalloc(M);InitializeStaticTLS(H);InitializeCRT(M);FindEntryPoints(M);[..]
}
PART 3 – PROPOSAL
RUNNING THE DOZER
”LoadLibrary can also be used to load other executable modules.[..]However, do not use LoadLibrary to run an .exe file.Instead, use the CreateProcess function.” (MSDN)
44
int buildozer::ImportEXE(llvm::StringRef EXE) {[..]HINSTANCE H = LoadLibraryA(EXE.data());if (!H)return 0;
RemapImportAddressTable(H);InitDebInfo();PatchRPMalloc(M);InitializeStaticTLS(H);InitializeCRT(M);FindEntryPoints(M);[..]
}
PART 3 – PROPOSAL
RUNNING THE DOZER
45
Pool.emplace(NumWorkers, [&]() {while (true) {
buildozer::WorkUnit *WU = AcquireWork(..);if (!WU)
break;
int Mod = IdentifyMOD(WU);
llvm::CrashRecoveryContext CRC;CRC.RunSafely([&] {
buildozer::Launch(Mod, WU->Directory, WU->Arguments);});[..]
}});Pool.join();
PART 3 – PROPOSAL
RUNNING THE DOZER
47
19 min 34 sec
11 min 53 sec
Clang 9.0
Buildozer
Local build, AAA game, x64 Editor Release
PART 3 – PROPOSAL
Intel Xeon W-2135 @ 3.7 GHz (6-core), 128 GB, NVMe SSD
SHORT TERM
PART 4– NEXT STEPS 49
• Remove OS jitter (in-RAM file content & stat cache)
• OBJ cache (in-RAM)
• Clang-LLD in-memory bridge
• Incrementally link along the way
• Incrementally compile along the way (SN Systems’ Program Repository)
• Remote API for distribution & caching
PLATFORM
LONG TERM
PART 4– NEXT STEPS 51
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
LONG TERM
PART 4– NEXT STEPS 52
PLATFORM
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
PLATFORM
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
PLATFORM
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
PLATFORM
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
PLATFORM
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
LONG TERM
PART 4– NEXT STEPS 53
PLATFORM
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
PLATFORM
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
PLATFORM
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
PLATFORM
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
PLATFORM
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
DAILY COMMITS
LONG TERM
PART 4– NEXT STEPS 54
PLATFORM
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
PLATFORM
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
PLATFORM
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
PLATFORM
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
PLATFORM
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
DAILY COMMITS
ACTIVE BRANCHES
LONG TERM
PART 4– NEXT STEPS 55
PLATFORM
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
PLATFORM
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
PLATFORM
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
PLATFORM
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
PLATFORM
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
DAILY COMMITS
ACTIVE BRANCHES
GAME PRODUCTION
LONG TERM
PART 4– NEXT STEPS 56
PLATFORM
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
PLATFORM
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
PLATFORM
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
PLATFORM
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
PLATFORM
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
BUILD TARGETBUILD TARGET
DAILY COMMITS
ACTIVE BRANCHES5 min
x6
x6
x100
x4
GAME PRODUCTIONx20