building large scale cloud with openstack cascading
DESCRIPTION
OpenStack cascading solution is designed for large scale distributed cloud, for example, a cloud even has million level VMs distributed in many data centers. OpenStack cascading mainly concentrates on multiple child OpenStack API aggregation, each child OpenStack exactly works internally as Amazon like availability zone, while the cloud still exposes one standard OpenStack API by the parent OpenStack to end user. Thus the solution brings Amazon like availability zone into current OpenStack for large scale distributed cloud, the benefit is as following: Aggregated distributed cloud with one OpenStack API endpoint Fault isolation with Amazon like availability zone Reduce Upgrade / OAM challenge with availability zone granularity Plug & Play fast integration of multi vendors / multi data centers cloud infrastrutcure with OpenStack API Scale out architecture even across multiple data centers Please refer to https://wiki.openstack.org/wiki/OpenStack_cascading_solution for more information.TRANSCRIPT
HUAWEI TECHNOLOGIES CO., LTD.
www.huawei.com
Huawei Confidential
Security Level:
Building large scale cloud
with OpenStack cascading
Chaoyi Huang ([email protected])
Hongning Wu([email protected])
04/11/2023
Last edited Jul 25, 2014
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Agenda
Page 2
1. Key issues to be solved for large scale
cloud
2. Introducing OpenStack cascading
solution
3. Large scale cloud deployment
scenario
4. Architecture & How it works
5. Availability & Progress & Team
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
How to build 100k hosts cloud with OpenStack
Page 3
Naturally, there are two ways:• scale up a single monolithic OpenStack region, but
1. It’s a big challenge for a single OpenStack to manage scale for example 100K hosts.
2. Can not obtain real fault isolation concept like Amazon’s available zone, especially considering internal rpc message across different physical cluster building blocks.
3. Single huge monolithic system bring high risk with OAM & trouble shooting, and big challenge for even the most skilled Op team to handle SW rolling upgrade.
4. Difficult for heterogeneous vendor’s cluster co-existing, e.g. juno/icehouse/vcenter…
• setup hundreds of OpenStack Regions with discrete API endpoint, but
• Have to buy or develop his own cloud management platform to integrate the discrete cloud into one cloud, and also, OpenStack API ecosystem is lost.
• Or, leave customer with splitted resource island without any association…
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
How to build 100k hosts cloud with OpenStack
Page 4
Basic design principles:• Never break current OpenStack architecture• Single standard OpenStack API endpoint, still work like one OpenStack, for ecosystem
friendly purpose.• Introduce Amazon like availability zone to large scale cloud for fault isolation and OAM
purpose• Fault isolation
• In any accidents, only part of the Cloud will be impacted• Very critical to meet definition of Amazon Available Zone
• Clear upgrade / OAM boundary, no upgrade / OAM propagation• not all the cloud should have to be upgraded / OAM. At least, not at the same time.
• Modular and scale-out design principles, but not monolithic and scale-up• Scale out architecture. Horizontal scalability, even across multiple data centers
• Large scale cloud will often include more than one data center• Plug & Play Fast integration
• Large scale cloud infrastructure should not be vendor lock-in. Fast integration for multi-vendors’ infrastructure is almost mandatory requirement for large scale cloud.
• Building on native code of OpenStack, no relying on any extra projects as far as possible.
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Here comes OpenStack cascading solution
Page 5
OpenStack
…OpenStack OpenStack
OpenStack API
OpenStack APIOpenStack API
Large scale cloud
OpenStack can manage may computers, why not treat OpenStack itself as one huge super computer?
OpenStack
Computer … Computer
One OpenStack
• The parent OpenStack expose standard OpenStack API
• Parent OpenStack use OpenStack standard API to manage many child OpenStacks
• Each child OpenStack functions as a Amazon like available zone and is hidden by the parent OpenStack.
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
OpenStack cascading is feasible thanks to the pluggable & extensible architecture
Page 6
Usually, OpenStack is to manage underlying nova-compute(KVM/Xen/…), cinder-volume (Ceph/LVM/…), L2/L3 agent (OVS/LinuxBridge/Router/…), image store (Ceph/S3/…)
The magic solution is to replace nova-compute’s hypervisor to Nova, cinder-volume’s storage to Cinder, L2/L3 agent’s network facility to Neutron, Glance’s image location to Glance, Ceilometer’s store to Ceilometer in the cascading level.
All these will be done through current OpenStack driver/agent mechanism.
REALLY MAGIC AND FANTASTIC!!!
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Feasibility Reference: the power of OpenStack pluggable & extensible architecture
Nova-API
Nova-Scheduler
RbbitMQ(Message Bus)
Nova-Conductor
Nova-Compute
ESXi-Driver
Filter Weight
API Extensio
n
Nova-Compute
DB
Nova-Compute
Nova-Driver
Novapluggable & extensible
Nova function
Nova-Driver
Nova
Libvert-Driver
Nova
Cascading idea
Nova
Nova(Cinder, Neutron, Glance, Ceilometer, … also ) has pluggable & extensible architecture. The backend hypervisor of Nova could be KVM, EXSi, VCenter, Xen. We can make Nova as one type of Hypervisor, and develop a driver to manage NovaSame idea for Cinder/Neutron/Glance/Ceilometer…
Nova-Api
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Agenda
Page 8
1. Key issues to be solved for large scale
cloud
2. Introducing OpenStack cascading
3. Large scale cloud deployment
scenario
4. Architecture & How it works
5. Availability & Progress & Team
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
OpenStack cascading solution
Page 9
OpenStack
…OpenStack OpenStack
OpenStack API
OpenStack APIOpenStack API
OpenStack cascading solution:The basic idea is to solve these challenge using Cascading OpenStack to integrate many Cascaded OpenStack via standard OpenStack API, meanwhile keep the cloud being accessed by OpenStack API yet.
The key idea here is to make a cascaded OpenStack work as Amazon like real availability zone to solve large scale cloud challenge.
Large scale cloud
Cascading OpenStack: providing API and scheduling , orchestration and networking of Cascaded OpenStacks
Cascaded OpenStack: Provisioning the VM, Volume and virtual Networking resources
AZ1 AZn
Cascading OpenStack
Cascaded OpenStack
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
How the solution solves the challenge elegantly
Page 10
Fault Isolation Scenario 1: if one Cascaded OpenStack failed, other part of the cloud can still work and accessible. This makes one cascaded OpenStack can act as Amazon like Availability Zone
OpenStack
…OpenStack OpenStack
OpenStack API
OpenStack APIOpenStack API
Large scale cloud
OpenStack
…OpenStack OpenStack
OpenStack API
OpenStack APIOpenStack API
Large scale cloud
Cascading OpenStack
Cascaded OpenStack
AZ1 AZn AZ1 AZn
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
How the solution solves the issues elegantly
Page 11
Fault Isolation Scenario 2:If Cascading OpenStack failed, all Cascaded OpenStacks are still manageable via OpenStack API independently. In phase I, the provisioning is not allowed for consistency consideration between cascading and Cascaded OpenStack. In phase II, after the consistency issue is solved, the provisioning can be allowed even if Cascading OpenStack is failed.
OpenStack
…OpenStack OpenStack
OpenStack API
OpenStack APIOpenStack API
Large scale cloud
OpenStack
…OpenStack OpenStack
OpenStack API
OpenStack APIOpenStack API
Large scale cloud
Cascading OpenStack
Cascaded OpenStack
AZ1 AZn AZ1 AZn
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
How the solution solves the issues elegantly
Page 12
…
Version 1 Cascading OpenStack Version 1 Cascaded OpenStack
Clear upgrade / OAM boundary, no upgrade / OAM propagation. Scenario 1:Cascading OpenStack manages Cascaded OpenStacks via standard OpenStack API. OpenStack api is restful API with backward compatibility. Therefore, V1 can interact with V2 OpenStack.
Scenario 1 is to upgrade / OAM ( configuration change or patches ) the Cascaded OpenStack first.
After all Cascaded OpenStack upgrade / OAM finished, then upgrade / OAM the Cascading OpenStack.
… … …
Version 2 Cascaded OpenStackVerion 2 Cascading OpenStack
Upgrade / OAMOne AZ
Rollback if required
Upgrade / OAMAZ by AZ gradually
Upgrade / OAM end user Service
Rollback if requiredRollback if required
Version 1 driver/agent for Nova/Cinder/…
Version 2 driver/agent for Nova/Cinder/…
AZ1 AZn AZ1 AZn AZ1 AZn AZ1 AZn
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
How the solution solves the issues elegantly
Page 13
…
Version 1 Cascading OpenStack Version 1 Cascaded OpenStack
… … …
Version 2 Cascaded OpenStackVerion 2 Cascading OpenStack
Upgrade / OAMOne AZ
Rollback if required
Upgrade / OAMAZ by AZ gradually
Upgrade / OAM end user Service
Rollback if requiredRollback if required
Version 1 driver/agent for Nova/Cinder/…
Version 2 driver/agent for Nova/Cinder/…
AZ1 AZn AZ1 AZn AZ1 AZn AZ1 AZn
Clear upgrade / OAM boundary, no upgrade / OAM propagation. Scenario 2:Cascading OpenStack manages Cascaded OpenStacks via standard OpenStack API. OpenStack api is restful API with backward compatibility. Therefore, V2 can interact with V1 OpenStack.
Scenario 2 is to upgrade / OAM the Cascading OpenStack first, then upgrade / OAM one Cascaded OpenStack to see whether the new version (or configuration change, or patches) will work well or not. If yes, upgrade / OAM Cascaded OpenStack one by one.
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
How the solution solves the issues elegantly
Page 14
Cascading OpenStack
Vendor 1 physical resources with Cascaded OpenStack built in
Plug&Play-Fast integration:Relied on the standard OpenStack API managed by Cascading OpenStack, a new vendor’s physical resources with OpenStack built-in could be integrated into the cloud via plug & play model, just like USB device plugged into PC. This benefit makes OpenStack API as the soft defined “PCI” bus in Cloud era.
Vendor1
Vendor1
Vendor1
Vendor2
Vendor1
Vendor1
Vendor2
Vendor 2 physical resources with Cascaded OpenStack built inVendor n physical resources with Cascaded OpenStack built in
OpenStack API OpenStack API OpenStack API
…Vendor
n
OpenStack APIOpenStack API
OpenStack API
OpenStack API OpenStack API
Vendor1
OpenStack API
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
How the solution solves the issues elegantly
Page 15
Cascading OpenStack
Vendor 1 physical resources with Cascaded OpenStack built in
Scale out architecture. Horizontal scalability, even cross multiple data centers:Scalability not only in one Cascaded OpenStack, but also for multi-vendors’s Cascaded OpenStack spread into many datacenters. For OpenStack API is restful API, one Cascading OpenStack to manage multiple datacenters across WAN is feasible.
Cascading OpenStack
Vendor1 Vendor2
Vendor 2 physical resources with Cascaded OpenStack built inVendor n physical resources with Cascaded OpenStack built in
…Vendor
n
OpenStack APIOpenStack API
OpenStack API
OpenStack API
Vendor1 Vendor2
Vendor1
OpenStack APIOpenStack APIOpenStack API
… ……
DC1 DC2 DCn
DC Datacenters with multi-vendors’ s Cascaded OpenStack built-in
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Extra benefit from OpenStack cascading solution
Page 16
1. It’s still one OpenStack even cascading used inside the cloud. One and only one OpenStack API interface for one large scale and distributed cloud even cross multi-data centers. All ecosystem built on OpenStack API could be leveraged.
2. Vendor neutral large scale cloud solution integrating through standard OpenStack API
3. Scale-out cloud architecture. Split ultra large scale Cloud into replicable small Cascaded OpenStack instances. Expansion could be simplified to copy/paste.
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
summary of OpenStack cascading solution
Page 17
• Unified cloud with one OpenStack API
• Fault isolation AZ by AZ
• Upgrade / OAM AZ by AZ, no upgrade / OAM propagation
• Plug & Play Fast integration by OpenStack API
• No vendor-lock in, Fast Integration for multi-vendors’ infrastructure
• Scale out architecture. Horizontal scalability, even cross multiple data centers
• Expansion with replication of AZ by AZ
Cascading OpenStack
Vendor1
OpenStack API
OpenStack API
…
DC1
Availability Zone 1
Vendor nVendor2Vendor1
OpenStack APIOpenStack API DCn
OpenStack API
Availability Zone 2
Availability Zone 3
Availability Zone n
***One AZ can still integrate heterogeneous hypervisors and heterogeneous physical resources
…… … … …
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Agenda
Page 18
1. Key issues to be solved for large scale
cloud
2. Introducing OpenStack cascading
3. Large scale cloud deployment
scenario
4. Architecture & How it works
5. Availability & Progress & Team
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Large scale cloud deployment scenario(1)
Page 19
Nova, Cinder, Neutron,Ceilometer, Glance, KeyStone, Heat
Nova, Cinder,
Neutron,Ceilometer, Glance
OpenStack API OpenStack API OpenStack API
OpenStack API
OpenStack API OpenStack APIOpenStack API
… ……
DC1 DC2 DCn
Nova, Cinder,
Neutron,Ceilometer, Glance
Nova, Cinder,
Neutron,Ceilometer, Glance
…
Nova, Cinder,
Neutron,Ceilometer, Glance
…
Nova, Cinder,
Neutron,Ceilometer, Glance
…
Nova, Cinder,
Neutron,Ceilometer, Glance
To realize the exposition of one (and only one) unified OpenStack API for all datacenters.All datacenters will be deployed with one or several Cascaded OpenStacks. And only one Cascading OpenStack to manage all Cascaded OpenStacks in all datacenters.
* * Glance could be shared service inside one DC
Cascading OpenStack
Cascaded OpenStack
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Large scale cloud deployment scenario(2)
Page 20
Nova, Cinder, Neutron,Ceilomete
r, Glance, Heat
Nova, Cinder, Neutron,Ceilo
meter, Glance.
OpenStack APIOpenStack API
… ……
DC1 DC2 DCn
Nova, Cinder,
Neutron,Ceilometer, Glance.
Nova, Cinder,
Neutron,Ceilometer, Glance.
…
Nova, Cinder,
Neutron,Ceilometer, Glance.
…
Nova, Cinder,
Neutron,Ceilometer, Glance.
…
Nova, Cinder,
Neutron,Ceilometer, Glance.
To manage ultra large scale datacenter using OpenStack cascading solution.
Each datacenter will expose OpenStack API.Small datacenter will be deployed with original OpenStackUltra large scale datacenter will utilize the power of OpenStack cascading solution.Shared KeyStone. The KeyStone could be located in any Data Centers but shared by all.* Glance could be shared service inside one DC * * Backend is included in Glance. Not shown here.
Nova, Cinder, Neutron,Ceilomete
r, Glance, Heat
OpenStack APIKeyStone
Cascading OpenStack
Cascaded OpenStack
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
How large scale the OpenStack cascading can support
Page 21
Nova, Cinder, Neutron,Ceilometer, Glance, KeyStone, Heat
Nova, Cinder, Neutron,Ceilomet
er, Glance
Nova, Cinder, Neutron,Ceilomet
er, Glance
Nova, Cinder, Neutron,Ceilomet
er, Glance…
… … …
1 2 100
1 2 3 1000 1 2 3 1000 1 2 3 1000
100K hosts = 100 (Cascaded OpenStack) X 1000 (hosts per Cascaded OpenStack)1M VMs = 10 (VMs per host) X 100K (hosts)
A cloud more than 1M VMs can also be achieved with performance tuning.
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Agenda
Page 22
1. Key issues to be solved for large scale
cloud
2. Introducing OpenStack cascading
3. Large scale cloud deployment
scenario
4. Architecture & How it works
5. Availability & Progress & Team
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
DB
Message Bus
Nova-API
Nova-SchedulerNova-
Conductor DB
Message Bus
Cinder-API
Cinder-Scheduler
DB
Message Bus
Neutron-API
Neutron-Plug-in
DB
Message Bus
Nova-APINova-
SchedulerNova-Conductor DB
Message Bus
Cinder-API
Cinder-Scheduler
DB
Message Bus
Neutron-API
Neutron-Plug-in
Cascaded OpenStack 1 Cascaded OpenStack x
…
Controller Node Controller NodeCompute 1Compute n
…
Compute 1Compute n
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
OpenStack cascading Architecture(Nova/Cinder/Neutron)
Page 23
DB
Message Bus
Nova-API
Nova-Scheduler
Nova-ConductorDB
Message Bus
Cinder-API
Cinder-Scheduler
DB
Message Bus
Neutron-API
Neutron-Plug-in
Cascading OpenStack
Nova-API Cinder-API Neutron-API Nova-API Cinder-API Neutron-API
Controller Node Compute x
Nova-API Cinder-API Neutron-APIIntroduced for OpenStack cascading solution
Nova-P
roxy
Cin
der-P
roxy
L2-P
roxy
L3-P
roxy
LB-P
roxy
VPN
-Pro
xy
Nova-P
roxy
Cin
der-P
roxy
L2-P
roxy
L3-P
roxy
LB-P
roxy
VPN
-Pro
xy
FW-P
roxy
FW-P
roxy
Compute 1
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Basic Assumption
Page 24
System Admin and Tenant User has different view of the Cloud• The system admin can have the knowledge of cascading topology. That means system admin can access both
Cascading OpenStack API and Cascaded OpenStack API• The tenant users only consume the Cascading OpenStack API, don’t care the cloud use cascading solution or not.
Bottom up or top down?• The provisioning activity will be done in top down way. That means, provisioning VM/Volume/Network from the Cascading
OpenStack level• Physical resource aware management will be done bottom up. For example, only system admin is aware of the physical
host attribute and know how to group host into HostAggregate. Another example is volume type, only the system admin know what type/qos of volume backend storage can provide, and define the volume type to be used by tenant user. Therefore, this kind of management should be done/configured at Cascaded OpenStack level. And Cascading OpenStack can only retrieve such information from Cascaded OpenStack and use it at Cascading OpenStack request.
• ***provisioning from the cascaded OpenStack will be solved in phase II
Weak consistency, consistency eventually• Because of the introducing Cascading OpenStack, there is delay for the true VM status, but it does work. The
consistency between cascading and Cascaded OpenStack is not in real time behavior, but keep consistency at last.
Logical object and real Object mapping• Just like the real object ID in the SAN is different from the Cinder Volume ID, there is also logical object ( for example,
VM, volume, etc) existing in the Cascading OpenStack, but the real object (for example, VM, volume, etc)resides in the Cascaded OpenStack. The resources UUID mapping will be stored for object addressing.
Python Client• Python client which is already used for CLI is used to make remote Cascaded OpenStack API calling in restful manner.
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Nova Cascading – Nova-Proxy
Page 25
• Nova-ProxyNova-Proxy acts as the same role of Nova-Compute in Cascading OpenStack.
One Nova-Proxy will be configured to be responsible for one Cascaded OpenStack availability zone.
One Cascaded OpenStack will usually act as one EC2 like availability zone. That means all compute nodes usually in one Cascaded OpenStack belongs to one same availability zone. Of course, the Cascaded OpenStack can includes more one availability zone.
The Cascading OpenStack can manage many nova-proxies for different availability zones, eg. manage many EC2 like availability Zone.
Nova-Scheduler in cascading level will schedule a proper Nova-Proxy according to availability zone the Nova-proxy belongs to, just like general schedule procedure. After receive the run_instance/etc request from message bus, Nova-Proxy will not boot a VM/etc directly, but treats cascaded Nova as its hypervisor, convert the internal request to Nova restful API calling to pre-configured cascaded Nova.
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Nova Cascading – how it works
Page 26
Nova-API
Nova-Scheduler
RbbitMQ(Message Bus)
Nova-Conductor
DB
Nova-Proxy
Nova
Nova-Api
Nova-Proxy
Nova
Nova-Api
…
Nova-Proxy is configured to manage a cascaded Nova with specified Availability Zone. All VM in the cascaded Nova of this AZ will be scheduled and located to the Nova-Proxy host in the cascading level.
class Nova-Proxy(manager.SchedulerDependentManager){ Proxy Nova request to cascaded NOVA after UUID translation Save the new VM UUID mapping if this is a new VM Inject UUID mapping to cascading Ceilometer Polling the batch VM status / task status and inject to DB}
The request is scheduled to proper Nova-Proxy through the availability zone information in the request.
The request is transferred to proper Nova-Proxy through the VM’s host attribute where the VM is located in the Nova-Proxy host instead
Nova cascading solution:One Nova-Proxy delegates one cascaded Nova Availability Zone is the core idea.
Cascading OpenStack
Cascaded OpenStack
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Nova Cascading – more description
Page 27
1. We did not develop a new virt driver, but replace the Nova compute manager.py by change the default configuration. The only reason is that we want to transparently transfer the TOKEN from the cascading API calling to the Cascaded OpenStack API.
2. HostAggregate will be managed at Cascaded OpenStack. The information will be polling by cascading Nova-proxy and synchronized to Cascading OpenStack DB.
3. The flavor could be managed by tenant user. The flavor will be synchronized to Cascaded OpenStack only when VM creation/manipulation is necessary.
4. The smart VM status synchronization method could be developed to reduce the messages to DB and message bus. Currently only periodic polling method is used.
5. The batch VM status query and synchronization batch VM status to DB will greatly reduce the he messages to DB and message bus. The whole cloud scale could be increased.
6. If no AZ is specified in the API request, a default AZ will be used instead currently(*This is the implementation of community OpenStack edition), it’s not good idea. It should be scheduled according to the AZ’s available resources.
7. For large scale cloud, the Cascaded OpenStack should be monitored for the resources consumption, if it’s above a threshold, capacity expansion must be taken into account. Therefore, the current design is scheduling based on availability zone only, for precise scheduling, the AZ free resources query API should be developed and scheduling according to available resources .
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Cinder Cascading – Cinder-Proxy
Page 28
• Cinder-ProxyCinder-Proxy acts as the same role of Cinder-Volume in Cascading OpenStack.
One Cinder-Proxy will be configured to be responsible for one Cascaded OpenStack availability zone. The Cascaded OpenStack availability zone for Cinder and Nova will be configured to share the same zone, that means the availability zone will work for both Nova and Cinder, it’s for the fault isolation purpose.
One Cascaded OpenStack will usually act as one EC2 like availability zone. That means all compute nodes and all cinder-volumes in one Cascaded OpenStack belongs to one same availability zone.
The Cascading OpenStack can manage many cinder-proxies for different availability zones, eg. manage many EC2 like availability Zone
Cinder-Scheduler in cascading level will schedule a proper Cinder-Proxy according to availability zone the Cinder-Proxy belongs to, just like general schedule procedure. After receive the create_volume/etc request from message bus, Cinder-Proxy will not create volume/etc directly, but treats cascaded Cinder as block storage backend, convert the internal request to Cinder restful API calling to pre-configured cascaded Cinder.
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Cinder Cascading – How it works
Page 29
Cinder-API
Cinder-Scheduler
RbbitMQ(Message Bus)
DB
Cinder-Proxy
Cinder
Cinder-Api
Cinder-Proxy
Cinder
Cinder-Api
…
Cinder-Proxy is configured to manage a cascaded Cinder with specified Availability Zone. All Volume/Snapshot/backup in the cascaded Cinder of this AZ will be scheduled and located to the Cinder-Proxy host in the cascading level.
class Cinder-Proxy(manager.SchedulerDependentManager){ Proxy Cinder request to cascaded Cinder after UUID translation Save the new Volume/Snapshot/Bakcup UUID mapping if this is a new Volume/Snapshot/Bakcup Inject UUID mapping to cascading Ceilometer Polling the batch Volume/Snapshot/Bakcup status / task status and inject to DB}
The request is scheduled to proper Cinder-Proxy through the availability zone (and volume type) information in the request.
The request is transferred to proper Cinder-Proxy through the host attribute where the volume is located in the Cinder-Proxy host instead
Cinder cascading solution:One Cinder-Proxy delegates one cascaded Cinder Availability Zone is the core idea.
Cascading OpenStack
Cascaded OpenStack
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Cinder Cascading – more description
Page 30
1. We did not develop a new volume driver, but replace the cinder-volume manager.py by change the default configuration. The only reason is that we want to transparently transfer the TOKEN from the cascading API calling to the Cascaded OpenStack API.
2. Volume type will be managed at Cascaded OpenStack. The information will be polling by cascading Cinder-proxy and synchronized to Cascading OpenStack DB. The volume type is attached with AZ information in Cascading OpenStack in order to avoiding volume type collision, not all AZ will support all volume type. If volume type is named uniquely and all Cascaded OpenStack support all volume types, then AZ information is not required to be attached.
3. Because we limit the capacity of one Cascaded OpenStack, the AZ scope will be shared by Nova and Cinder. That means, the AZ is same meaning to Nova/Cinder. This design will dramatically reduce the complex for OpenStack cascading solution.
4. The smart volume status synchronization method could be developed to reduce the messages to DB and message bus. Currently only periodic polling method is used.
5. The batch volume status query and synchronization batch volume status to DB will greatly reduce the he messages to DB and message bus. The whole cloud scale could be increased.
6. If no AZ is specified in the API request, a default AZ will be used instead currently(*This is the implementation of community OpenStack edition), it’s not good idea. It should be scheduled according to the AZ’s available resources.
7. For large scale cloud, the Cascaded OpenStack should be monitored for the resources consumption, if it’s above a threshold, capacity expansion must be taken into account. Therefore, the current design is scheduling based on availability zone/volume type only, for precise scheduling, the AZ free resources query API should be developed.
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Neutron Cascading – interaction with Nova
Page 31
Nova-API
Nova-Proxy
Nova
Nova-Proxy
Nova
Neuton-API
L2/L3-Proxy
2. Create Port
5.Create VM( Port )
Neutron
Neutron
6. Vif Plug
3.Create Network / Subnet / Port (IP/mac)
1. Create VM
7.Periodic polling port status
The network/subnet/port will be created in the cascaded OpenStack where the VM resides
4. fake VIF plug to pass UUID mapping
VM
DVR
Subnet Cascading OpenStack
Cascaded OpenStack
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Neutron Cascading – remote port mapping(1)
Page 32
Neuton-API
L2/L3-Proxy L2/L3-Proxy
Neutron Neutron
VM1 VM2
VxLAN0
If one VxLAN network has VMs attached from two AZ, then the VxLAN network will be created in both Neutron located in different AZs. Networking in one AZ could be solved using current mechanism, but how about cross OpenStack L2/L3 networking?
VM3 VM4
AZ1 AZ2
VM1 VM2
VxLAN0 DVR
VM3 VM4
VxLAN0 DVR
DVR
Cascading OpenStack
Cascaded OpenStack
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Neutron Cascading – remote port mapping(2)
Page 33
Neuton-API
L2/L3-ProxyL2/L3-Proxy
Neutron Neutron
VM1 VM2
VxLAN0
First, We must solve the cross OpenStack L2 VxLan networking to make all VMs attached to VxLAN0 can reach each other. This could be done through creating virtual remote port via L2 population: using (VM mac / remote Host IP ) to setup VxLAN tunneling.
VM3 VM4
AZ1 AZ2
VM1 VM2
VxLAN0 DVR
VM3 VM4
VxLAN0 DVR
DVR
1.Periodic polling port status( for example VM2 port)
2. VM2 Port status up
3. L2 population
4. fdb_add ( Port for VM2 IP / VM 2 mac / Host IP )
5. Create virtual remote Port for VM2
(with VM2 IP / VM2 mac / VM2 host IP)
VM2
6. Internal L2 population and DVR population for virtual remote port for VM2
Virtual remote port
Cascading OpenStack
Cascaded OpenStack
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Neutron Cascading – remote port mapping(3)
Page 34
Neuton-API
L2/L3-ProxyL2/L3-Proxy
Neutron Neutron
VM1 VM2
VxLAN0
Through virtual remote port and cascading and cascaded level L2 population. All VM can reach each other through the correct tunneling information ( VM mac / remote Host IP ). And also DVR can know where the VM resides according ( VM IP / remote host IP )
VM3 VM4
AZ1 AZ2
VM1 VM2
VxLAN0 DVR
VM3 VM4
VxLAN0 DVR
DVR
VM2 VM1VM3 VM4
Virtual remote port
Cascading OpenStack
Cascaded OpenStack
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Neutron Cascading – remote port mapping(4)
Page 35
Neuton-API
L2/L3-Proxy L2/L3-Proxy
Neutron Neutron
VM1 VM2
VxLAN0
Let’s add one more VxLAN network to the DVR. 2,3,4,5 is the process of DVR happened in cascading level. Through DVR process and virtual remote port method, we can make DVR knows where the VM resides according ( VM IP / remote host IP ). L2/L3 can work across cascaded OpenStack
VM3 VM4
AZ1 AZ2
VM1 VM2
VxLAN0DVR
VM3 VM4
VxLAN0DVR
DVR
VM2 VM1VM3 VM4
Virtual remote port
VM5 VM6
VxLAN1
L2/L3-Proxy
Neutron AZ3
VxLAN0DVR
VM4 VM3 VM2 VM1
VM6
VxLAN1
VM5 VM6VM5 VM6
1. Add router interface ( VxLAN1 -> DVR )
2. Router update ( VxLAN1, DVR )4. Get port by subnet ( VxLAN 0 )
3. Add router interface ( VxLAN1 -> DVR )
5. Create virtual remote port for all VMs in VxLAN 0
2. Router update ( VxLAN1, DVR )4. Get port by subnet ( VxLAN 1 )
3. Create VxLAN 1, add router inter face ( VxLAN1, DVR )
5. Create virtual remote port for all VMs in VxLAN 1
VxLAN1VxLAN1
3. Create VxLAN 1, add router inter face ( VxLAN1, DVR )
5. Create virtual remote port for all VMs in VxLAN 1
VM5
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Neutron Cascading – remote subnet mapping(1)
Page 36
Neuton-API
L2/L3-ProxyL2/L3-Proxy
Neutron Neutron
VM1 VM2
VLAN0
Challenge is still there. If the tenant has two VLAN network resides in two AZ. No VxLAN l2 population, then how to create virtual remote port? Virtual remote port doesn’t work here.
AZ1 AZ2
VM1 VM2
VLAN0 DVR
DVR
VM3 VM4
VLAN1
VM3 VM4
VLAN1 DVRVirtual remote port
Cascading OpenStack
Cascaded OpenStack
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Neutron Cascading – remote subnet mapping(2)
Page 37
Neuton-API
L2/L3-ProxyL2/L3-Proxy
Neutron Neutron
VM1 VM2
VLAN0
The general way to solve the issue is to set up VPN among routers.The shortage of this method is: heavy, performance, more API interaction and technologies integration involved
AZ1 AZ2
VM1 VM2
VLAN0 DVR
DVR
VM3 VM4
VLAN1
VM3 VM4
VLAN1 DVR
DVR DVR
VPN
DVR
Cascading OpenStack
Cascaded OpenStack
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Neutron Cascading – remote subnet mapping(3)
Page 38
Neuton-API
L2/L3-ProxyL2/L3-Proxy
Neutron Neutron
VM1 VM2
VLAN0
Fortunately, router has feature to add explicit route for special destination. There are two issues here: 1. The next hop address should be tunneling purpose, other wise tenant network will be interfered by underlying
physical network 2. To set next hop for each VM destination will be a challenge: set up IP-IP tunneling, must to know each VM host
IP
AZ1 AZ2
VM1 VM2
VLAN0 DVR
DVR
VM3 VM4
VLAN1
VM3 VM4
VLAN1DVR
"router":{"routes":[
{"nexthop":"10.10.10.10",“destination”:“192.168.2.10/32"}
}
"router":{"routes":[
{“nexthop”:“10.10.20.20",“destination”:“192.168.1.20/32"}
}
Inspiration 1:
Router’s next hop setting
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Neutron Cascading – remote subnet mapping(4)
Page 39
Neuton-API
L2/L3-ProxyL2/L3-Proxy
Neutron Neutron
VM1 VM2
VLAN0
AZ1
VM1 VM2
VLAN0 DVR
DVR
VM3 VM4
VLAN1
VM3 VM4
VLAN1DVRVxLAN relay network
According to the VxLAN virtual remote port experience and router next hop inspiration, an idea to create a VxLAN relay network to bridge across OpenStack network is naturally methodology. T he shortage is that a virtual relay VxLAN network has to be created and maintained (for example, IP address, port …) for each router or each tenant. Still not good idea.
AZ2
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Neutron Cascading – remote subnet mapping(4)
Page 40
Inspiration 2:pluggable-external-networkhttps://blueprints.launchpad.net/neutron/+spec/pluggable-ext-nethttps://review.openstack.org/#/c/88619/5/specs/juno/pluggable-ext-net.rst
1. Piggy back network introduced.Using this space 100.64.0.0/10 [#] will guarantee no conflict with tenant networks.[#] http://www.iana.org/assignments/ipv4-address-space/ipv4-address-space.xhtml[#] http://tools.ietf.org/html/rfc6598#section-1
2.Onlink routes introduced.On external networks with multiple subnets, routers need onlink routes for all subnets https://bugs.launchpad.net/neutron/+bug/1312467 )
Router
Router
Router
100.64.0.0/10
Internet
Onlink routes
Onlink routes
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Neutron Cascading – remote subnet mapping(5)
Page 41
Inspiration 1 + Inspiration 2:1. Piggy network introduced.
Using this space 100.64.0.0/10
2.Onlink routes introduced.N-S routers with onlink router
3. remote subnet mappingThrough routing to next hop router for remote subnet mapping
DVR(Centralize
d Node)
N-S Router
100.64.0.0/10
Internet
Onlink routes
Onlink routes
VM1 VM2
VLAN0 DVR
VM3 VM4
VLAN1 DVR
DVR(Centralize
d Node)
1. DVR ( external network: 100.64.10.10 )2. DVR ( next hop “ 100.64.20.20“, destination “ 192.168.2.0/24 ” )3. DVR ( next hop “ 100.64.30.30“, destination “ 0.0.0.0/0 ” )
192.168.2.0/24192.168.1.0/24
1. DVR ( external network: 100.64.20.20)2. DVR ( next hop “ 100.64.10.10“, destination “ 192.168.1.0/24 ” )3. DVR ( next hop “ 100.64.30.30“, destination “ 0.0.0.0/0 ” )
100.64.20.20100.64.10.10
100.64.30.30
A reserved VxLAN VNI is used for piggy network
tunneling purpose.
AZ1 AZ2
It’s still one DVR centralized node except
that this node connected to public
network
Value:1. No virtual remote port
mapping required, hence reduce the L2/L3 population burden.
2. Not only work for VLAN, also work for VxLAN if the VxLAN only in one cascaded OpenStack. We call this kind of VxLAN network to local VxLAN network.
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Neutron Cascading
– heterogeneous L2 network across OpenStack
Page 42
The more complicated scenario is to support heterogeneous L2 network across OpenStack. This could be done by multiple segment network with one VxLAN network to bridge the segments. And the cross OpenStack VxLAN can use virtual remote port method and L2GW to make the network workable.
OpenStackOpenStack
Heterogeneous L2 network across OpenStackUse multi-segment network across Cascaded OpenStacks
Use one cross OpenStack VxLAN network to bridge the local segments(VLAN or VxLAN) distributed in different Cascaded OpenStack. L2 relay GW is used to bridge the different segments.
VM1 VM2
VxLAN0
VLAN1VM3 VM4
VLAN2
VM1 VM2
VxLAN0VxLAN1
VM3 VM4VLAN2
VM1 VM2
VxLAN0VxLAN1
VM3 VM4VxLAN2
OpenStack
L2-Proxy L2-Proxy
CreateNetwork(VxLAN0, VLAN1, VLAN2)CreateNetwork(VxLAN0, VxLAN1, VLAN2)CreateNetwork(VxLAN0, VxLAN1, VxLAN2)
L2 GW L2 GW
L2 GW L2 GW
L2 GW L2 GW
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Cascaded Neutron
Neutron Cascading – all L2 network supported
Cascaded Neutron Cascaded Neutron
VM1 VM2 VM3 VM4VxLAN0
Cascaded Neutron
VM1 VM2
VxLAN0
VLAN1
Cascaded Neutron
VM3 VM4VLAN2
Cascaded Neutron
VM1 VM2
VxLAN0VxLAN1
Cascaded Neutron
VM3 VM4VLAN2
Cascaded Neutron
VM1 VM2
VxLAN0VxLAN1
Cascaded Neutron
VM3 VM4VxLAN2
2. VxLan network across Cascaded OpenStacks
Cascaded Neutron
VM1 VM2VLAN1
1. Local VLAN/VxLAN network inside one Cascaded OpenStack
VM3 VM4VxLAN1
***GRE tunneling network can easily be implemented like VxLAN, not start yet.
To use which type of L2 network in OpenStack cascading scenario, it’s the trade off between data-path efficiency or management efficiency.
3.heterogenous multi-segments L2 network across Cascaded OpenStacks
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Neutron Cascading – advanced feature
Page 44
Virtual network will be created where the VM is located, all virtual networks are created in lazy decision model. If all VMs for one network are located in one Cascaded OpenStack, then the network will be only present in that Cascaded OpenStack.
DHCP: no DHCP cascading is required. The mac/IP will be allocated in the cascading level when Nova-Proxy creates port for VM. And the port will be created in Cascaded OpenStack with the specified mac/IP just allocated in cascading. The DHCP service in the Cascaded OpenStack will be configured with the mac/IP from the port.
(DHCP will occupy one subnet IP for currently community implementation. The DHCP IP in the subnet should be able to be specified in the API just like the gateway IP address)
FIP/FW/LB/VPN: rely on the implementation of DVR, coming soon. (DVR was just delayed to land in Juno-3)
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
OpenStack cascading Architecture(Glance&Ceilometer/Heat/KeyStone)
Page 45
DB
Glance-API
Cascading OpenStack
CascadedOpenStack
Sync-Manager
Sync-Driver
DB
Glance-API
Ceilometer-Proxy
Ceilometer-API
StorageEngine NovaCinde
rNeutro
n
Heat
Storage
Image-Store
Storage
Image-Store
DB
Glance-API
Storage
Image-Store MongoDB
Ceilometer-API
StorageEngine
hBase
Ceilometer-API
StorageEngine
KeyStone
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
OpenStack cascading Architecture(Glance&Ceilometer/Heat/KeyStone)
Page 46
Glance CascadingGlance provide the image service for OpenStack. The cascading for Glance is relying on the mechanism of image-multiple-
location: https://blueprints.launchpad.net/glance/+spec/multiple-image-locations https://blueprints.launchpad.net/glance/+spec/image-location-selection-strategy
Image Synchronization is major task for cascading Glance. Synchronization includes the image location information (which cascaded Glance has this image, which cascaded Glance has no this image) and the image itself ( the storage where the image can be accessed ).
Sync-Manager is developed to do the synchronization for image among the cascading and policy determined Cascaded OpenStacks. The synchronization policy could be set to synchronize the image to all cascaded Glance or selected Glance. The image data could be transferred using different protocol FTP/TFTP/http/BT… by different driver. Just like the multi-location mechanism provided by the Glance, the specific storage deployment and image transferring method is decoupled from the glance image location.
Ceilometer CascadingAll data is stored in the cascaded Ceilometer, will not be transferred to cascading Ceilometer (For 100k hosts’ cloud, almost
20GB/min(***estimated) data will be collected/analyzed by Ceilometer. Only distribution can meet the demand. ).
The cascading Ceilometer act as API proxy for cascaded Ceilometer. Thereefore, Ceilometer-proxy will be the storage engine in cascading Ceilometer. For the query will collect data from multiple cascaded Ceilometer, a distributed query engine middleware(eg. Presto) will be introduced in Ceilometer-proxy.
Heat/KeyStone CascadingAll OpenStack instances(including Cascading OpenStack and Cascaded OpenStacks) will share one KeyStone service. That
means, KeyStone is a global service, No cascading is required.
Heat will consume and enjoy api exposed by Cascading OpenStack, no cascading is required.
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Glance Cascading
Page 47
Glance cascading solution:Just use cascaded Glance as location backend of cascading Glance.
DB
Glance-API
Sync-Manager
Sync-Driver
DB
Glance-API
Storage
Image-Store
Storage
Image-Store
DB
Glance-API
Storage
Image-Store
There are 3 scenarios will trigger the synchronization of Image
1. Create a image metadata, upload the image data to default storage.
2. Create a image metadata, update the location of the image.(The image data has been uploaded to the storage, just register the location of the image to Glance)
3. Create VM snapshot, then cascaded NOVA will create a image metadata in cascaded Glance first, and then upload the image data to storage, after the uploading finished, register the image location to the image. Cascading Nova will sync the VM snapshot image to the cascading Glance, and register the image location(the access address in the cascaded Glance)
If one the above 3 scenarios happened, the syncmanager will check the sync policy and the image owner, to see if the image should be synchronized to other cascaded Glance. If yes, call the Sync-driver to sync the image to cascaded Glance according to sync-region-list:
1) Sync the image metadata to the specified glance2) Sync the image data to the specified region image storage3) Register the image location to the image in the just sync cascaded
Glance 4) Register the new image address(accessed in the cascaded Glance)
to the cascading image
1 2 3
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Ceilometer Cascading
Page 48
Ceilometer
Heat
Cascading
AutoScaling Alarm request
Ceilometer-Proxy
Ceilometer Ceilometer
Ceilometer API calling
class Ceilometer-Proxy(base.StorageEngine){ UUID mapping injected by Nova/Cinder/Neutron/Glance Resource UUID translation Resource UUID and Ceilometer Location addressing Proxy the request to proper Ceilometer}
The webhook setting ( callback to HEAT ) for alarm action will be sent to cascaded Ceilometer transparently
The webhook (callback to HEAT)
Ceilometer cascading solution:Just use cascaded Ceilometers as StogradeEngine of Cascading OpenStack. All requests from cascading ceilometer will be proxy to proper Ceilometer Cascading OpenStack Cascaded OpenStack
(eg. Presto)
Distributed query engine with ceilometer plug-in
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
OpenStack cascading solution & performance concerns
Page 49
In an OpenStack instance, the performance bottleneck will be message bus and DB. The cascading OpenStack instance has some performance advantage compared to general OpenStack instance:
Nova in the cascading levelThere are lots of internal state update messages during VM creation. The number of exchanged message / DB access for one VM
creation will be reduced greatly by batch periodic polling the VM status from the cascaded OpenStack. Host available resource reporting and metering collecting traffic reduced too.
Cinder in the cascading levelSimilar as Nova.
Neutron in the cascading levelBecause the L2/L3 neutron-proxy delegates one cascaded OpenStack, and often VMs of one tenant/network will be limitedly
located in one or two or three cascaded OpenStacks, the L2 population and L3 DVR population traffic will be greatly reduced in the cascading level.
Glance in the cascading levelThe cascading Glance will be only function as location registry, then the load will be distributed and greatly reduced to meet 100k
hosts cloud’s image service request.
Ceilometer in the cascading levelIt’s almost impossible for one Ceilometer instance to handle 100k hosts’ (or 1 million VMs) metering and alarm function.
Cascading and distributed ceilometer service provide the possibility.
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Agenda
Page 50
1. Key issues to be solved for large scale
cloud
2. Introducing OpenStack cascading
3. Large scale cloud deployment
scenario
4. Architecture & How it works
5. Availability & Progress & Team
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Availability & Progress & Team
Page 51
Progress:
Member Contact Role
Chaoyi Huang
[email protected] inventor, designer of Nova/Cinder/Glance cascading, code reviewer
Hongning Wu
inventor, designer of Neutron cascading, code reviewer
Fan Qin [email protected] developer of Nova cascading
Chi Zhang [email protected] developer of Cinder cascading
Haojie Jia [email protected] developer of Neutron cascading
Dong Jia [email protected] developer of Glance cascading
Module Progress
Nova almost finished (80%)
Cinder almost finished (80%)
Neutron L2/L3 finished, optimization ongoing. Advanced networking features relied on DVR like FIP/SNAT/FWaaS/LBaaS/VPNaaS not started.
Glance 60%
Ceilometer Not started
KeyStone Not required
Heat Not required
POC Team:
HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
Availability
Page 52
Publicly availability of “Wiki page / Source Code / Document / Installation / How to play /…” are expected to be available in August, 2014
In-door demo is available now, public accessible demo is also expected to be available in August , 2014
Thank youwww.huawei.com