nicolas le scouarnec joint work with …...fabien andré, stéphane gouache, nicolas le scouarnec...

27
technicolor.com NICOLAS LE SCOUARNEC JOINT WORK WITH FABIEN ANDRÉ, STÉPHANE GOUACHE, ANTOINE MONSIFROT

Upload: others

Post on 09-Apr-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash

technicolor.com

NICOLAS LE SCOUARNEC

JOINT WORK WITH FABIEN ANDRÉ,

STÉPHANE GOUACHE, ANTOINE MONSIFROT

Page 2: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash

2

Typical internet access network

Customer Premise Equipment

(Residential Gateway)

Access Network

Internet

Page 3: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash

3

vCPE Rationale

Simplify software running on millions of

embedded devices

► Easier upgrades

► Better integration

Provide visibility into home network

► Secure IoT

► Remote troubleshooting

Page 4: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash

4

Building middleboxes for residential networks

InternetMiddlebox

(e.g., NAT and Firewall)

Customer Premise Equipment

(Residential Gateway)

Access Network

Page 5: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash

5

What (not) to use ?

NFV approach (virtualized appliances)

► One VM/container per customer

► Running existing software (e.g., OpenWRT or Linux)

► As done for example in R-CORD

Virtual Switches for traffic dispatching to VM

Does not scale to millions of VMs/containers

Not cost effective

Page 6: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash

6

Which equipment to use ?

Vendor B

(HW Appliance)

Vendor C

(HW Appliance)

2x Xeon E5 v4

(40 cores)

Vendor A

(VM)

StatelessNF

(NSDI’17)

L4 Throughput

(Simple IMIX)

58 Mpps

130 Gbps

63 Mpps

140 Gbps

180 Mpps

400 Gbps

4,5 Mpps

10 Gbps

4 Mpps

10 Gbps

Cost 65 K$ (HW+SW) 200 K$

(HW+SW)

30 K$

(HW)

21 K$ (SW) NA

Redundancy

model

1+1 1+1 N+1 1+1 N+1

Objective:

180 Mpps / server

4.5 Mpps / core

Available SW for running

on COTS server

Page 7: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash

7

The residential vCPE challenge

Build a middlebox (firewall, NAT, …)

for residential networks

from COTS hardware

Efficient, Reliable, Scalable

L4 connection tracking

For millions of users

Page 8: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash

8

Best practices for high-performance networking software

Avoid context switches

► Use kernel-bypass systems (e.g., DPDK)

Don’t lock, don’t share

► Cross-core sharing is expensive even without explicit locking

Run-to-completion model

► Receive, process, transmit, without buffering nor blocking

Applying all these principles everywhere is non-trivial

Page 9: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash

9

Reliable - sharding and replication

Shard 1

Shard 2

Shard 3

Shard 4

Shard 5Assign both a master server and

a slave server to each shardReplicate state

from master to slave for each shard

Provide reliability by design, not as an afterthought

Page 10: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash

10

Replication - Availability rather than Consistency

No external DB► Faster insertion and lookup rate (450M lookups/second on 18 cores)

► Non-blocking (no remote memory access)

Availability rather than consistency► Networks are unreliable, applications will recover

► Yet, even short unavailabilities are noticed by user

► Master does not wait for acknowledgment from slave

Efficient lock-less replication► Batching for improved performance

► Same thread for packet processing and replication

► Traffic not interrupted during slave initialization, using support from hash table

Page 11: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash

11

Efficient (I) – Sharding to the core

Shard 1

Shard 2

Shard 3

Shard 4

Shard 5

Page 12: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash

12

Efficient (I) - Sharding to the core

Shard 1

Shard 2

Shard 3

Shard 4

Shard 5

Enforce share-nothing by binding each shard exclusively to a single CPU core

All packet processing & management done by the corresponding thread12

Page 13: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash

13

Efficient (II) - Expose each core to the network

Shard 1

Shard 2

Shard 3

Shard 4

Shard 5Expose an independant identity for each core (not server nor NIC) on the network

One single mechanism to address between and within servers

Each core appears in the system as a independent router

06:00:00:02:12:45 / 172.24.0.1

06:00:00:02:13:A0 / 172.24.0.2

06:00:00:02:25:B2 / 172.24.0.3

06:00:00:02:F2:35 / 172.24.0.4

06:00:00:02:31:A5 / 172.24.0.5

06:00:00:02:13:AC / 172.24.0.6

06:00:00:02:45:D2 / 172.24.0.7

06:00:00:02:F9:A4 / 172.24.0.8

06:00:00:02:B2:30 / 172.24.0.9

06:00:00:02:53:BE / 172.24.0.10

06:00:00:02:DF:E3 / 172.24.0.11

06:00:00:02:A2:32 / 172.24.0.12

13

Page 14: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash

14

Efficient (II) - Scalable load-balancing by NICs and Switches

Shard 1

Shard 2

Shard 3

Shard 4

Shard 5

Leverage existing top-of-rack switches and server-class NIC to entirely offload load-balancing

Physical L3 Switches are much more efficient than virtual switches

06:00:00:02:12:45 / 172.24.0.1

06:00:00:02:13:A0 / 172.24.0.2

06:00:00:02:25:B2 / 172.24.0.3

06:00:00:02:F2:35 / 172.24.0.4

06:00:00:02:31:A5 / 172.24.0.5

06:00:00:02:13:AC / 172.24.0.6

06:00:00:02:45:D2 / 172.24.0.7

06:00:00:02:F9:A4 / 172.24.0.8

06:00:00:02:B2:30 / 172.24.0.9

06:00:00:02:53:BE / 172.24.0.10

06:00:00:02:DF:E3 / 172.24.0.11

06:00:00:02:A2:32 / 172.24.0.12

BGP

BGP

BGP

IP Routing Table

14

Page 15: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash

Efficient (III) – Handle reverse traffic efficiently

15

Shard 1

Shard 2

Shard 3

Shard 4

Shard 5

IP Routing Table

Shard 1

Shard 2

Shard 3

Shard 4

Shard 5

IP routing allows precise control on reverse path and also failover path

Traffic is highly asymetrical, use VLAN to improve hardware usage

IP routing allows more control than RSS or ECMP based distribution

VLAN 4 / VLAN 5 / 06:00:00:52:11:45 / 172.25.0.1 06:00:00:02:12:45 / 172.24.0.1

15

IP Routing Table

Page 16: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash

16

Our design: benefits

Distribution across servers and across cores identical

► Simplified implementation

► Performance scale linearly across cores and across servers

Dynamic load-balancing included (dynamic routing + replication)

► Re-balance the load between servers

► Scale-out and in as demand evolve : elasticity

00:00 03:00 06:00 12:00 15:00 18:00 21:00 00:00

Daily Internet TrafficUnused resources

(75% potential savings

energy, cooling,…)

Page 17: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash

17

Benchmarking

Multi-core, multi-server benchmarking tool following the same principles

System under test

(large-scale and multi-server)

Traffic generator

Page 18: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash

18

Benchmarking

Multi-core, multi-server benchmarking tool following the same principles

System under test

(large-scale and multi-server)

Page 19: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash

19

Performance

0

20

40

60

80

100

Linux(e.g., R-CORD)

Krononat

Mp

ps

Performance (12 cores) for established connections

Page 20: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash

20

Scalability

0

10

20

30

40

50

60

70

80

1 2 3 4 5 6 7 8 9 10 11 12

Mp

ps

Core

Performance for established connections

Linux Krononat (without replication) Krononat (with replication) Objective

Objective:

4,5 Mpps/core

Page 21: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash

21

Availability - Server departure

0

10

20

30

40

50

60

70

80

0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750

occu

rre

nce

s

Service interruption duration (ms)

Less than 600 ms → below network timeouts

Page 22: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash

22

Conclusions

Resilient distributed middlebox using COTS hardware► 77 million packets per second on only 12 cores

• 6,4 Mpps/core above objective (4,5 Mpps/core)

► Recover from failures automatically without users noticing

► Cost-effective N+1 redundancy

► Redundancy and dynamic load-balancing allow elasticity

Re-usable design► Expose each core as a distinct entity to the network

► Push per-core traffic steering to the networking equipments (NIC, switches)

► Applied to multi-server multi-core benchmarking tool

Page 23: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash

23

References

Don’t share, Don’t lock: Large-scale Software Connection Tracking with Krononat

Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot

USENIX ATC’18

Cuckoo++ Hash Tables: High-Performance Hash Tables for Networking

Applications

Nicolas Le Scouarnec

ACM/IEEE ANCS’18

Page 24: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash

APPENDIX

Page 25: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash

25

Building a distributed software CG-NAT/FW/…

► Bi-directional traffic

► Must filter unknown connections

L4 Load-balancers

► Maglev

► Ananta

► Fastly@NSDI

► SilkRoad

► …

Access network

► No-reverse path traffic (DSR)

► Leverage deterministic hashing

Page 26: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash

26

Availability : Graceful departure

0

10

20

30

40

50

60

70

80

0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750

occu

rre

nce

s

Service interruption duration (ms)

Failure detection and recovery

Load

rebalancing

Page 27: NICOLAS LE SCOUARNEC JOINT WORK WITH …...Fabien André, Stéphane Gouache, Nicolas Le Scouarnec and Antoine Monsifrot USENIX ATC’18 Cuckoo++ Hash Tables: High-Performance Hash

27

Availability : Hard Failure

0

20

40

60

80

100

120

0 400 800 1200 1600 2000 2400 2800 3200 3600 4000 4400 4800 5200 5600 6000

occu

ren

ce

s

Service interruption duration (ms)

Failure detection and recoveryLoad

rebalancing

Less than 7s → below many network timeouts