dragonflow austin summit talk
TRANSCRIPT
![Page 1: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/1.jpg)
Eran Gampel, HuaweiLi Ma, AWCloudGal Sagie, Huawei
![Page 2: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/2.jpg)
Highlights • Integral “Big Tent” project in OpenStack• Designed for High Scale, Performance and Low Latency• Lightweight and Simple• Easily Extendable• Distributed SDN Control Plane• Focus on advanced networking services• Distributes Policy Level Abstraction to the Compute Nodes
![Page 3: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/3.jpg)
Neutron-Server
Dragonflow Plugin
DBOVSDragonflow
DBDriver
Compute Node
OVSDragonflow
DBDriver
Compute Node
OVSDragonflowDB
Driver
Compute Node
OVSDragonflowDB
Driver
Compute Node
DB
VM VM..VM VM..
VM VM.. VM VM..
Distributed SDN
![Page 4: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/4.jpg)
“Under The Hood”
Compute Node Compute Node Compute Node Dragonflow
Network DB
OVS
NeutronServer
Redis
OVSDB-Server
ETCD RethinkDBRAMCloud
Kernel Datapath Module
NIC
User Space
Kernel Space
DB DriversRedis ETCD RethinkDBRMC
Future
Dragonflow PluginRoute Core
API SG
vswitchd
Container
VM Dragonflow ControllerAbstraction Layer
L2 App L3 AppDHCP App
FWaaS LBaaS …FIP/DNAT
Pluggable DB Layer
NB D
B Dr
iver
s
SB DB Drivers
SmartNIC OVSDB
ZooKeeper
ETCD
RMC
Redis
OpenFlow
SG App
ZooKeeper
Zookeeper
Pub/Sub DriversRedis ØMQ
![Page 5: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/5.jpg)
Current Release Features (Mitaka)L2 core API, IPv4, IPv6
GRE/VxLAN/STT/Geneve tunneling protocols Distributed L3 Virtual RouterDistributed DHCP Pluggable Distributed Database
ETCD, RethinkDB, RAMCloud, Redis, ZooKeeperPluggable Publish-Subscribe
ØMQ, RedisSecurity Groups
OVS Flows leveraging connection tracking integrationDistributed DNATSelective Proactive Distribution
Tenant Based
![Page 6: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/6.jpg)
Pluggable Database FrameworkRequirements
HA + ScalabilityDifferent Environments have different requirements
Performance, Latency, Scalability, etc.
Why Pluggable?Long time to productizeMature Open Source alternativesAllow us to focus on the networking services only
![Page 7: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/7.jpg)
Distributed DB
DB Data 3DB Data 2DB Data 1
Full Distribution
Compute Node 1
DragonflowLocal Cache
OVS
Compute Node NDragonflow
OVS
Local Cache
Dragonflow DB DriversRedis ETCD ZookeeperRMC
DB Data 3DB Data 2DB Data 1
DB Data 3DB Data 2DB Data 1
![Page 8: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/8.jpg)
DistributedDatabase
DB Data 3DB Data 2
DB Data 1
Selective Proactive Distribution
Compute Node 1
DragonflowLocal Cache
OVS
DB Data 1
Compute Node NDragonflow
OVS
Local Cache
DB Data 3DB Data 2
Dragonflow DB DriversRedis ETCD ZookeeperRMC
![Page 9: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/9.jpg)
Selective Proactive Distribution
Compute Node 1
DragonflowLocal Cache
OVS
Net1 – VM1, VM2
Compute Node 2Dragonflow
OVS
Local CacheNet2 – VM3, VM4
VM1 VM2 VM3 VM4
Distributed DB
Net2 – VM3, VM4Net1 – VM1, VM2
![Page 10: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/10.jpg)
Compute Node
DragonflowLocal Controller
SubscriberRedis ØMQ
Compute Node
DragonflowLocal Controller
SubscriberRedis ØMQ
Compute Node
DragonflowLocal Controller
SubscriberRedis ØMQ
Compute Node
DragonflowLocal Controller
SubscriberRedis ØMQ
Neutron ServerDragonflow
PluginPublisher
Redis ØMQ
Neutron ServerDragonflow
PluginPublisher
Redis ØMQ
Neutron ServerDragonflow
PluginPublisher
Redis ØMQ. . .
DB
. . .
Pluggable Pub/Sub
![Page 11: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/11.jpg)
Example Distributed DHCP
Page 11
![Page 12: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/12.jpg)
Network Node
DHCP namespace
DHCP namespace
DHCP namespace
DHCP namespace
Neutron DHCP Implementation
DHCP namespace
dnsmasq
DHCPAgent
Neutron Server
Message QueueExample• 100 Tenants• 3 vNet / tenant= 300 DHCP Servers
![Page 13: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/13.jpg)
1 VM Send DHCP_DISCOVER
2 Classify Flow as DHCP, Forward to Controller
3 DHCP App sends DHCP_OFFER back to VM
4 VM Send DHCP_REQUEST
5 Classify Flow as DHCP, Forward to Controller
6 DHCP App populates DHCP_OPTIONS from DB/CFG and send DHCP_ACK
Dragonflow Distributed DHCP
DHCP DISCOVER
VM DHCP SERVER
DHCP OFFER DHCPREQUEST
DHCPACK
13
46
7
Compute Node
Dragonflow
VM
OVS
VM
1 2
br-intqvoXXX qvoXXX
OpenFlow
14
25
7
Dragonflow ControllerAbstraction Layer
L2App
L3App
DHCPApp SG
36
Pluggable DB Layer
DistributedDB
This haseverythingin it
![Page 14: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/14.jpg)
Is Dragonflow Ready?AWCloud Point of View
![Page 15: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/15.jpg)
Who is AWcloud?A pure OpenStack player in ChinaStarted in 2012Intel Capital-backed StartupBroad deployment of production clouds in ChinaLarge-scale infra operations backgroundHighly diverse set of workloads and use cases
![Page 16: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/16.jpg)
Case Study: a typical large-scale deployment• A public cloud for local enterprises• Cooperated with Gaoxin Yiyun, Dell and Intel• Located in Guizhou Province
• The heart of big data industry in China• 2500+ Physical Servers deployed in the data center
• So far, 500+ physical servers have been virtualized by AWcloud OpenStack distribution.
![Page 17: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/17.jpg)
Case Study: a typical large-scale deployment
![Page 18: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/18.jpg)
Dispatch Network Policy to Compute NodesRequirements:
ScalabilityReliability
Currently, we use Neutron OVS plugin…but as workloads increase…
![Page 19: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/19.jpg)
Limitations in Large-scale deployments- Messaging- Persistent HA DB to store network policy
![Page 20: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/20.jpg)
Scale-OUT
![Page 21: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/21.jpg)
Scalability in Persistent StorageRDBMS used in OpenStack
• The semantic is too strong for scale-out applications• Critical performance loss due to semantics
We Believe• Centralized clustering cannot practically scale-out in data center size
We Need• A distributed data storage system which is…
• Optimized for READ• Reached a CONSISTENT state for the whole system• Always HIGH AVAILABILITY• Able to work properly under network PARTITION
Necessary
![Page 22: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/22.jpg)
Scalability in Persistent Storage
We prefer BASE systems for data backends• Basically Available• Soft-state• Eventual consistent
? Is there any open source solution that can meet our requirements?
![Page 23: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/23.jpg)
Scalable Persistent Storage in Dragonflow
• A pluggable Key-Value Interface Layer• Supported Solutions
• ETCD• RAMCloud• ZooKeeper• Redis Is it enough?
Scalable and reliable?
![Page 24: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/24.jpg)
DB Consistency: Common Problem to all SDN Solutions
SDN ControllerNorth-bound Interface (REST?)
South-bound Interface (Openflow)
SDN Apps
SDN DB
NeutronDB
Neutron-serverML2-Core-Plugin
ML2.Drivers.Mechanism.XXX
Services-PluginService
Network
Neutron API Nova API
CLI / Dashboard (Horizon) / Orchestration Tool (Heat)
Switch
Nova
Nova Compute
VM VM
Nova Compute
VM VM
Virtual Switch Virtual SwitchNeutron
Plugin AgentNeutron
Plugin Agent
Message Queue (AMQP)
Neutron-L3-Agent
Neutron-DHCP-Agent
Load
Bal
ance
r
Fire
wall
VPN
L3 S
ervic
es
Topo
logy
Mgr
.
Ove
rlay
Mgr
.
Secu
rity
Vendor-specific API
![Page 25: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/25.jpg)
DB Consistency: Common Problem to all SDN Solutions
Neutron DB
Relational Database
ACID system
Stores the whole virtualized network topology for OpenStack
Dragonflow DB
Key-value Store
BASE system
Stores the ‘partial’ virtualized network topology used in Dragonflow
![Page 26: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/26.jpg)
DB Consistency: Common Problem to all SDN Solution • Problem 1
• Neutron DB transaction is committed, but the related operations on Dragonflow DB have failed.
• Problem 2• Concurrent APIs cause multiple transactions on a given Neutron object. Neutron
DB can deal with it very well due to its ACID nature. How about Dragonflow DB?
• Problem 3• Nested transactions can be done in Neutron DB. How about Dragonflow DB?
• Problem N…
![Page 27: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/27.jpg)
Some thoughts on DB Consistency• Remove Neutron DB
• Complicated Solution when involving ML2• Cannot be done in a short period of time
• Introduce the pluggable key-value store into Neutron• How to work with SQLAlchemy?• Need much more time on evaluation and deep
discussion.
• Any other simple and straightforward solutions?
![Page 28: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/28.jpg)
DB Consistency in Dragonflow
• Introduce a distributed lock for coordination• Guarantee the atomicity of a given API• Implemented in the Neutron core plugin layer• Project-based lock allows concurrency
DragonflowNorth-bound Interface
South-bound Interface (Openflow)
SDN Apps
DF DB
NeutronDB
Neutron-serverCore Plugin
Dragonflow Neutron Plugin
Neutron API
CLI / Dashboard (Horizon) / Orchestration Tool (Heat)
Topo
logy
Mgr
.
Ove
rlay
Mgr
.
Secu
rity
Obtain distributed lock
Dragonflow NB API
![Page 29: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/29.jpg)
DB Consistency in Dragonflow
• Introduce an object synchronization mechanism• All the objects stored in both databases are versioned.• Take advantage of CAS operations of the Dragonflow DB.• Sync the object when something unexpected happens.
SDN DBNeutronDB
Network_ID Name Status MTU VLAN Availability Zone Subnets
Object_ID = Network_ID Version = 5
Read
Notify
compare & swap <- VersionCompute Node Compute Node Compute Node
Dragonflow Local Controller
SubscribervSwitch Flush Flows
![Page 30: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/30.jpg)
Roadmap
![Page 31: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/31.jpg)
OpenStack Networking Challenges
• Scalability• Networking does not scale (<500 compute nodes)
• Performance• Networking performance is low (namespace overhead, huge
control plane overhead, …)• Operability
• Reference implementation has lots of maintenance problems (e.g. thousands of concurrent DHCP servers, namespaces, etc.)
![Page 32: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/32.jpg)
Dragonflow ScalabilityScale(# Compute
Nodes)
today
10,000
2,500
Time to Market
n+2
1,000
Dragonflow& Redis
Dragonflow &RAMCloud &
ØMQOptimized Hybrid
(Reactive/Proactive)Dragonflow4,000
n+1soon
![Page 33: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/33.jpg)
RoadmapAdditional DB Drivers ZooKeeper, Redis…Selective Proactive DB Pluggable Pub/Sub Mechanism DB Consistency Distributed DNATSecurity GroupHierarchical Port Binding (SDN ToR) move to ML2Containers (Kuryr plugin and nested VM support)Topology Service Injection / Service ChainingInter Cloud Connectivity (Border Gateway / L2GW)Optimize Scale and Performance
![Page 34: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/34.jpg)
Rack
ToR
Overlay Virtual Topology
UnderlayHardware
DB
CoreRouterODL
NeutronServer ML2 Mechanism Drivers
ODLDragonflow
Hierarchical Network Management
![Page 35: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/35.jpg)
Dragonflow and Containers
![Page 36: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/36.jpg)
Network Services…
My Cloud
NATIPS
FW
DHCP
LBL3
L2 VPN
![Page 37: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/37.jpg)
Topology Based Service InjectionCompute Node
OVS
VM 1 VM 2
Table 0 Table 1 Table N…
ExternalApp
External App
Table
OpenFlow / Other API
![Page 38: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/38.jpg)
Service Injection Hooks
![Page 39: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/39.jpg)
Newton Release New Applications IGMP Application Distributed Load Balancing (East/West) Brute Force prevention DNS service Distributed Metadata proxy Port Fault Detection
![Page 40: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/40.jpg)
Ride the Dragon!
• Documentation • https://wiki.openstack.org/wiki/Dragonflow
• Bugs & blueprints • https://launchpad.net/dragonflow
• DF IRC channel • #openstack-dragonflow• Weekly on Monday at 0900 UTC in #openstack-meeting-4
(IRC)
![Page 41: Dragonflow Austin Summit Talk](https://reader033.vdocuments.mx/reader033/viewer/2022051521/587beae61a28ab765a8b5a5d/html5/thumbnails/41.jpg)
Welcome to join discussion!
• Dragonflow Work Sessions• April 27th, 9:50am, Hilton Austin MR414 – Testing & CI• April 27th, 11:00am, Hilton Austin MR414 – DB Consistency• April 28th, 11:20am, Hilton Austin MR406 – Roadmap• April 28th, 1:30pm, Hilton Austin MR414 – PUB/SUB• April 28th, 2:20pm, Hilton Austin MR414 – ML2 & Topology
Injection