![Page 1: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/1.jpg)
P4LLVM: An LLVM based P4 Compiler
Tharun Kumar Dangeti, Venkata Keerthy Soundararajan, Ramakrishna UpadrastaIndian Institute of Technology Hyderabad
First P4 European Workshop - P4EUSeptember 24th, 2018
![Page 2: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/2.jpg)
Outline● P4LLVM - LLVM based P4 compiler
○ Better optimizations = improve the runtime performance of the network
● Frontend ○ P4-16 code ➝ LLVM Intermediate Representation (IR)
● Backend○ LLVM IR ➝ Architecture code
● JSON Backend○ BMv2 is a software switch for prototyping purpose○ The input to BMv2 is a JSON file
● P4LLVM - Current support○ P4-16 ➝ LLVM IR ➝ JSON (BMv2 target)
1
![Page 3: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/3.jpg)
P4LLVM: Why?● Input (switch) configuration affects the performance
○ Need for stronger compiler optimizations
● Compiler Optimizations○ Theory Strong○ Implementation
■ Evolves over time ■ Incremental development and difficult
● Use existing & active compiler tool-chains○ Mature enough ○ Relatively bug-free○ Community support
2
![Page 4: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/4.jpg)
What is LLVM?
● Compiler infrastructure
● Written in C++
● Designed for Compile time, Link time and Run time optimizations.
● License: Allows LLVM to be used and sold in commercial tools.
● Flexible: Designed to be used like an API/library
○ GCC: more monolithic, cannot be used as a library
Frontend Optimizer BackendLLVM IR Optimized
LLVM IRC++
C
Fortran
x86
ARM
SPARC
3
![Page 5: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/5.jpg)
What is LLVM?
C
x86
ARM
int sum(int a,int b){ return a+b;}int main(){ int a = sum(1,3); return a;}
SPARC
4
Frontend Optimizer BackendLLVM IR
Optimized LLVM IR
![Page 6: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/6.jpg)
What is LLVM?
Frontend Optimizer BackendLLVM IR
Optimized LLVM IR
C
x86
ARM
int sum(int a,int b){ return a+b;}int main(){ int a = sum(1,3); return a;}
; ModuleID = 'file.c'target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"target triple = "x86_64-unknown-linux-gnu"; Function Attrs: nounwind uwtabledefine i32 @sum(i32 %a, i32 %b) #0 {%1 = alloca i32, align 4%2 = alloca i32, align 4store i32 %a, i32* %1, align 4store i32 %b, i32* %2, align 4%3 = load i32, i32* %1, align 4%4 = load i32, i32* %2, align 4%5 = add nsw i32 %3, %4ret i32 %5}; Function Attrs: nounwind uwtabledefine i32 @main() #0 {...
SPARC
4
![Page 7: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/7.jpg)
What is LLVM?
Frontend Optimizer BackendLLVM IR
Optimized LLVM IR
C ARM
int sum(int a,int b){ return a+b;}int main(){ int a = sum(1,3); return a;}
; ModuleID = 'file.c'target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"target triple = "x86_64-unknown-linux-gnu"; Function Attrs: nounwind uwtabledefine i32 @sum(i32 %a, i32 %b) #0 {%1 = alloca i32, align 4%2 = alloca i32, align 4store i32 %a, i32* %1, align 4store i32 %b, i32* %2, align 4%3 = load i32, i32* %1, align 4%4 = load i32, i32* %2, align 4%5 = add nsw i32 %3, %4ret i32 %5}; Function Attrs: nounwind uwtabledefine i32 @main() #0 {...
.text
.file "file.c"
.globl sum
.p2align4, 0x90.type sum,@functionsum:# @sum.cfi_startproc# BB#0:# %entrypushq%rbp.Lcfi0:.cfi_def_cfa_offset 16.Lcfi1:.cfi_offset %rbp, -16movq%rsp, %rbp.Lcfi2:.cfi_def_cfa_register %rbpmovl%edi, -4(%rbp)movl%esi, -8(%rbp)... 4
![Page 8: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/8.jpg)
A closer look at P4’s compiler
Frontend Midend BackendIR IRP4 program
Architecture independent policies
Architecture Details
Control plane APIs
Switch configuration
5
![Page 9: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/9.jpg)
A closer look at P4’s compiler
Frontend Midend BackendIR IRP4 program
Architecture independent policies
Architecture Details
Control plane APIs
Switch configuration
5
Are they similar??
YES!
![Page 10: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/10.jpg)
Can P4 use LLVM?
Recall
Frontend Optimizer BackendLLVM IR
Optimized LLVM IR
C
C++
Fortran
x86
ARM
SPARC
6
![Page 11: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/11.jpg)
Can P4 use LLVM?
Recall
Frontend Optimizer BackendLLVM IR
Optimized LLVM IR
C
C++
Fortran
x86
ARM
SPARC
P4 P4 Targets
YES!
6
![Page 12: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/12.jpg)
Can P4 use LLVM?
ARM
P4
x86
P4 Backend
SPARC
Frontend BackendLLVM IR
Optimized LLVM IR
Optimizer
#include <core.p4>#include <ebpf_model.p4>
struct Headers_t {}
parser prs(packet_in p, out Headers_t headers) { state start { transition accept; }}
6
![Page 13: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/13.jpg)
Can P4 use LLVM?
ARM
P4
x86
P4 Backend
SPARC
Frontend BackendLLVM IR
Optimized LLVM IR
Optimizer
#include <core.p4>#include <ebpf_model.p4>
struct Headers_t {}
parser prs(packet_in p, out Headers_t headers) { state start { transition accept; }}
; ModuleID = 'p4Code'source_filename = "p4Code"%struct.Headers_t = type {}
define i32 @prs(i32** %p, %struct.Headers_t* %headers) {entry: br label %startstart: ; preds = %entry br label %acceptaccept: ; preds = %start ret i32 1reject: ; No predecessors! ret i32 0}
6
![Page 14: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/14.jpg)
Can P4 use LLVM?
P4 P4 BackendFrontend BackendLLVM IR
Optimized LLVM IR
Optimizer
#include <core.p4>#include <ebpf_model.p4>
struct Headers_t {}
parser prs(packet_in p, out Headers_t headers) { state start { transition accept; }}
; ModuleID = 'p4Code'source_filename = "p4Code"%struct.Headers_t = type {}
define i32 @prs(i32** %p, %struct.Headers_t* %headers) {entry: br label %startstart: ; preds = %entry br label %acceptaccept: ; preds = %start ret i32 1reject: ; No predecessors! ret i32 0}
???
We showcase JSON backend6
![Page 15: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/15.jpg)
Current architecture of P4LLVM
7
![Page 16: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/16.jpg)
Representing P4 In LLVM-IR
8
P4 Construct Equivalent LLVM IR Construct
Data types: Headers, Structs Struct types
Data types: Header Union Array of structs
Primitives: Int and Bit Int and vector of 1 bit ints
Declarations Alloca instructions
Assignments Store instructionsExtern calls: Extract, Verify, SetValid/Invalid, IsValid, Apply Function declaration and corresponding calls
Tables Similar to Apply
Parser, Control, Action, Deparser Functions
Direction: In Passed by value
Direction: Out, InOut Passed by reference
![Page 17: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/17.jpg)
P4 IN LLVM-IR: Headers
struct Headers {Hdr h;
}
header hdr {int <32> a;int <32> b;bit <8> c;
}
P4 code
9
![Page 18: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/18.jpg)
P4 IN LLVM-IR: Headers
struct Headers {Hdr h;
}
header hdr {int <32> a;int <32> b;bit <8> c;
}
%struct.Headers = type { %struct.hdr }%struct.hdr = type {i32, i32, <8 x i1>}
P4 code
Equivalent LLVM IR
9
![Page 19: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/19.jpg)
P4 IN LLVM-IR: Parser
10
parser ParserImpl(...) {
state parse_ethernet { packet.extract(hdr.ethernet); transition select(hdr.ethernet.etherType) { 16w0x88f7: parse_ip; default: reject; } } state parse_ip { packet.extract(hdr.ip); transition select(hdr.ip.version) { 16w4: accept; default: reject; } }}
![Page 20: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/20.jpg)
P4 IN LLVM-IR: Parserparser ParserImpl(...) {
state parse_ethernet { packet.extract(hdr.ethernet); transition select(hdr.ethernet.etherType) { 16w0x88f7: parse_ip; default: reject; } } state parse_ip { packet.extract(hdr.ip); transition select(hdr.ip.version) { 16w4: accept; default: reject; } }}
parse_ethernet: %0 = load %struct.headers, %struct.headers* %hdr %1 = getelementptr %struct.headers, %struct.headers* %hdr, i32 0, i32 0 call void @extract(%struct.ethernet_t* %1) %2 = load %struct.headers, %struct.headers* %hdr %3 = getelementptr %struct.headers, %struct.headers* %hdr, i32 0, i32 0 %4 = load %struct.ethernet_t, %struct.ethernet_t* %3 %5 = load %struct.headers, %struct.headers* %hdr %6 = getelementptr %struct.headers, %struct.headers* %hdr, i32 0, i32 0 %7 = getelementptr %struct.ethernet_t, %struct.ethernet_t* %6, i32 0, i32 2 %8 = bitcast <16 x i1>* %7 to i16* %9 = load i16, i16* %8 switch i16 %9, label %reject [ i16 -30473, label %parse_ip ]parse_ip: …….
10
![Page 21: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/21.jpg)
P4 IN LLVM-IR: apply table
11
control ingress() {
table forward_table { actions = { forward; _drop; } key = { hdr.ethernet.dstAddr: exact; } size = 4; } apply { forward_table.apply(); }}
![Page 22: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/22.jpg)
P4 IN LLVM-IR: apply table control ingress() {
table forward_table { actions = { forward; _drop; } key = { hdr.ethernet.dstAddr: exact; } size = 4; } apply { forward_table.apply(); }}
call void @apply_forward_table( “exact”, i48 %7, @forward, @_drop, @NoAction, i8* %8, i32 4, @NoAction )
11
![Page 23: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/23.jpg)
Code generation● With LLVM the already available backends can be reused
○ Supports almost all CPUs and GPUs.
● LLVM provides sophisticated framework for addition of backends
● We have developed a JSON backend to show the effectiveness of optimizations
● Can target only v1model as of now
● Can be extended to any new switch models like PSA
12
![Page 24: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/24.jpg)
Optimizations● Numerous opportunities
○ Removing unused headers
○ Merging states with a single fan-out in parser
○ Eliminating dead/unreachable states
○ Removing unutilized extract calls
13
![Page 25: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/25.jpg)
Optimizations● Numerous opportunities
○ Removing unused headers■ Dead code elimination - DCE
○ Merging states with a single fan-out in parser■ Simplify CFG
○ Eliminating dead/unreachable states■ Aggressive DCE
○ Removing unutilized extract calls
13
![Page 26: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/26.jpg)
Optimizations● Numerous opportunities
○ Removing unused headers■ Dead code elimination - DCE
○ Merging states with a single fan-out in parser■ Simplify CFG
○ Eliminating dead/unreachable states■ Aggressive DCE
○ Removing unutilized extract calls
● SSA based optimizations○ Simplifies and enables large set of optimizations
● Some trivial optimizations - available in P4C○ Dead-state elimination○ Constant propagation
13
![Page 27: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/27.jpg)
Optimization - Opportunities
int<32> val;if (hdr.ethernet.dstAddr !=
hdr.ethernet.dstAddr ) {val = 5 + 6;
}else {
val = 8;}
...// Access to val
...
P4 code
Case 1
14
![Page 28: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/28.jpg)
Optimization - Opportunities
int<32> val;if (hdr.ethernet.dstAddr !=
hdr.ethernet.dstAddr ) {val = 5 + 6;
}else {
val = 8;}
...// Access to val
...
int<32> val;if (hdr.ethernet.dstAddr !=
hdr.ethernet.dstAddr ) {val = 11;
}else {
val = 8;}
...// Access to val
...
P4C
Optimized by P4CP4 code
Case 1
14
![Page 29: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/29.jpg)
Optimization - Opportunities
int<32> val;if (hdr.ethernet.dstAddr !=
hdr.ethernet.dstAddr ) {val = 5 + 6;
}else {
val = 8;}
...// Access to val
...
P4 code
Case 1
15
![Page 30: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/30.jpg)
Optimization - Opportunities
int<32> val;if (hdr.ethernet.dstAddr !=
hdr.ethernet.dstAddr ) {val = 5 + 6;
}else {
val = 8;}
...// Access to val
...
P4 code
int<32> val = 8;...
// Access to val...
P4LLVM
Optimized by P4LLVM
Case 1
15
![Page 31: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/31.jpg)
Optimization - Opportunities
struct hdr { … }state parse_hdr {
...packet.extract(hdr);
...transition select (hdr.a) {
default : accept;}
}<No further use of hdr>
Case 2
16
![Page 32: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/32.jpg)
Optimization - Opportunities
struct hdr { … }state parse_hdr {
...packet.extract(hdr);
...transition select (hdr.a) {
default : accept;}
}<No further use of hdr>
Case 2
select → more instructions than required
Can be removed
16
![Page 33: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/33.jpg)
Optimization - Opportunities
struct hdr { … }state parse_hdr {
...packet.extract(hdr);
...transition select (hdr.a) {
default : accept;}
}<No further use of hdr>
Case 2
Unnecessary extraction
Can be removed
16
![Page 34: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/34.jpg)
Optimization - Opportunities
struct hdr { … }state parse_hdr {
...packet.extract(hdr);
...transition select (hdr.a) {
default : accept;}
}<No further use of hdr>
Case 2
Unused header
Can be removed
16
![Page 35: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/35.jpg)
Optimization - Opportunities
struct hdr { … }state parse_hdr {
...packet.extract(hdr);
...transition select (hdr.a) {
default : accept;}
}<No further use of hdr>
Case 2
Unused header
Can be removed
One optimization cascades to other optimizations
16
![Page 36: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/36.jpg)
Experimentation ● Whippersnapper#
○ Benchmark suite to study the performance by P4 compilers○ P4 compilers like P4C, P4FPGA, PISCES, Xilinx SDNet have been studied
● We extended to Whippersnapper 2.0* ○ To generate complex expressions, conditionals, etc.○ So that optimizations can be studied
● We study performance using Whippersnapper 2.0 in comparison with P4C
*https://github.com/IITH-Compilers/Whippersnapper-2.0
#Dang, Huynh Tu, et al. "Whippersnapper: A p4 language benchmark suite." Proceedings of the Symposium on SDN Research. ACM, 2017.
17
![Page 37: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/37.jpg)
Experimentation: Setup
● LLVM IR optimized for size - Oz● Latencies
○ Average of packet processing times of 10,000 packets
18
![Page 38: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/38.jpg)
Optimization levels (LLVM 5.0.1)❖ O1: targetlibinfo tti tbaa scoped-noalias assumption-cache-tracker
profile-summary-info forceattrs inferattrs ipsccp globalopt domtree mem2reg deadargelim basicaa aa instcombine simplifycfg basiccg globals-aa prune-eh always-inline functionattrs sroa memoryssa early-cse-memssa speculative-execution lazy-value-info jump-threading correlated-propagation libcalls-shrinkwrap loops branch-prob block-freq pgo-memop-opt tailcallelim reassociate loop-simplify lcssa-verification lcssa scalar-evolution loop-rotate licm loop-unswitch indvars loop-idiom loop-deletion loop-unroll memdep memcpyopt sccp demanded-bits bdce dse postdomtree adce barrier rpo-functionattrs float2int loop-accesses lazy-branch-prob lazy-block-freq opt-remark-emitter loop-distribute loop-vectorize loop-load-elim latesimplifycfg alignment-from-assumptions strip-dead-prototypes loop-sink instsimplify verify
❖ O2: O1 + constmerge + elim-avail-extern + globaldce + gvn - always-inline + inline + mldst-motion + slp-vectorizer
❖ O3: O2 + argpromotion❖ Os: O2 - libcalls-shrinkwrap - pgo-memop-opt❖ Oz: Os - slp-vectorizer
19
![Page 39: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/39.jpg)
Results: Action Complexity
● P4C-BMV2: Existing P4C compiler
● P4LLVM: Without optimizations.
● P4LLVM-Oz: With ‘Oz’ optimization sequence of LLVM
20
![Page 40: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/40.jpg)
Results: Action Complexity
● P4C-BMV2: Existing P4C compiler
● P4LLVM: Without optimizations.
● P4LLVM-Oz: With ‘Oz’ optimization sequence of LLVM
20
![Page 41: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/41.jpg)
Results: Action Complexity
● P4C-BMV2: Existing P4C compiler
● P4LLVM: Without optimizations.
● P4LLVM-Oz: With ‘Oz’ optimization sequence of LLVM
~250%
20
![Page 42: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/42.jpg)
Results: Action Complexity
● P4C-BMV2: Existing P4C compiler
● P4LLVM: Without optimizations.
● P4LLVM-Oz: With ‘Oz’ optimization sequence of LLVM
20
![Page 43: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/43.jpg)
Results: Action Complexity
● P4C-BMV2: Existing P4C compiler
● P4LLVM: Without optimizations.
● P4LLVM-Oz: With ‘Oz’ optimization sequence of LLVM
~245%
20
![Page 44: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/44.jpg)
Results: Action Complexity
● P4C-BMV2: Existing P4C compiler
● P4LLVM: Without optimizations.
● P4LLVM-Oz: With ‘Oz’ optimization sequence of LLVM
20
![Page 45: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/45.jpg)
Results: Action Complexity
● P4C-BMV2: Existing P4C compiler
● P4LLVM: Without optimizations.
● P4LLVM-Oz: With ‘Oz’ optimization sequence of LLVM
~60%
20
![Page 46: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/46.jpg)
Results: Table Depth
● P4C-BMV2: Existing P4C compiler
● P4LLVM: Without optimizations.
● P4LLVM-Oz: With ‘Oz’ optimization sequence of LLVM
21
![Page 47: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/47.jpg)
Future Perspectives● Tailoring a suitable pass sequence for P4
● Connecting with P4runtime
● Extending backend support for P4LLVM
22
![Page 48: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/48.jpg)
Summary● LLVM can fit with scope and spirit of P4 ⇒ P4LLVM
● Optimizations by P4LLVM show performance improvement○ When compared to the established P4 compiler
● Target-independent optimizations with generic Oz sequence○ More improvement ➝ tailor Target-specific sequence
● Source code: https://github.com/IITH-Compilers/P4LLVM
23
![Page 49: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/49.jpg)
Acknowledgements: Cisco India Team - Raju Datla, Suresh Goduguluru, Sameek Banerjee
![Page 50: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/50.jpg)
Extra slides
50
![Page 51: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/51.jpg)
How?
Mid-endFront-end P4-IR
LLVM-IROptimizationsJSON Back-end
P4C
LLVM
● Front-end module in P4C* to check syntactic and semantic correctness.
● We reuse P4 specific passes in the mid-end of P4C
○ like simplifyKey, simplifySelectList, removeParameters, etc.
○ these can be conveniently written within the LLVM framework.
*https://github.com/p4lang/p4c/tree/master/docs51
![Page 52: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/52.jpg)
Optimization levels (LLVM 5.0.1)❖ O2: targetlibinfo tti tbaa scoped-noalias assumption-cache-tracker
profile-summary-info forceattrs inferattrs ipsccp globalopt domtree mem2reg deadargelim basicaa aa instcombine simplifycfg basiccg globals-aa prune-eh inline functionattrs sroa memoryssa early-cse-memssa speculative-execution lazy-value-info jump-threading correlated-propagation libcalls-shrinkwrap loops branch-prob block-freq pgo-memop-opt tailcallelim reassociate loop-simplify lcssa-verification lcssa scalar-evolution loop-rotate licm loop-unswitch indvars loop-idiom loop-deletion loop-unroll mldst-motion memdep lazy-branch-prob lazy-block-freq opt-remark-emitter gvn memcpyopt sccp demanded-bits bdce dse postdomtree adce barrier elim-avail-extern rpo-functionattrs float2int loop-accesses loop-distribute loop-vectorize loop-load-elim slp-vectorizer latesimplifycfg alignment-from-assumptions strip-dead-prototypes globaldce constmerge loop-sink instsimplify verify
52
![Page 53: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/53.jpg)
Optimization levels (LLVM 5.0.1)❖ Oz: targetlibinfo tti tbaa scoped-noalias assumption-cache-tracker
profile-summary-info forceattrs inferattrs ipsccp globalopt domtree mem2reg deadargelim basicaa aa instcombine simplifycfg basiccg globals-aa prune-eh inline functionattrs sroa memoryssa early-cse-memssa speculative-execution lazy-value-info jump-threading correlated-propagation
loops branch-prob block-freq tailcallelim reassociate loop-simplify lcssa-verification lcssa scalar-evolution loop-rotate licm loop-unswitch indvars loop-idiom loop-deletion loop-unroll mldst-motion memdep lazy-branch-prob lazy-block-freq opt-remark-emitter gvn memcpyopt sccp demanded-bits bdce dse postdomtree adce barrier elim-avail-extern rpo-functionattrs float2int loop-accesses loop-distribute loop-vectorize loop-load-elim latesimplifycfg alignment-from-assumptions strip-dead-prototypes globaldce constmerge loop-sink instsimplify verify
53
![Page 54: First P4 European Workshop - P4EUBackend LLVM IR Architecture code ... Incremental development and difficult Use existing & active compiler tool-chains ... SSA based optimizations](https://reader034.vdocuments.mx/reader034/viewer/2022052023/6038fc201e34593ac7379bca/html5/thumbnails/54.jpg)
Optimization levels (LLVM 5.0.1)❖ Oz: targetlibinfo tti tbaa scoped-noalias assumption-cache-tracker
profile-summary-info forceattrs inferattrs ipsccp globalopt domtree mem2reg deadargelim basicaa aa instcombine simplifycfg basiccg globals-aa prune-eh inline functionattrs sroa memoryssa early-cse-memssa speculative-execution lazy-value-info jump-threading correlated-propagation
loops branch-prob block-freq tailcallelim reassociate loop-simplify lcssa-verification lcssa scalar-evolution loop-rotate licm loop-unswitch indvars loop-idiom loop-deletion loop-unroll mldst-motion memdep lazy-branch-prob lazy-block-freq opt-remark-emitter gvn memcpyopt sccp demanded-bits bdce dse postdomtree adce barrier elim-avail-extern rpo-functionattrs float2int loop-accesses loop-distribute loop-vectorize loop-load-elim latesimplifycfg alignment-from-assumptions strip-dead-prototypes globaldce constmerge loop-sink instsimplify verify
54
Oz sequence ⇒ optimizations to reduce code size