idx whitepaper - wr titanium vs vanilla - nfv day 2 - the...

11
White Paper NFV DAY 2 - The Critical Tipping Point Wind River Titanium Cloud vs “Vanilla OpenStack” - An Economic & Technical Comparative Analysis

Upload: others

Post on 30-Dec-2019

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IDX WhitePaper - WR Titanium vs Vanilla - NFV DAY 2 - The ...events.windriver.com/wrcd01/wrcm/2019/04/nfv-day2-the-ciritical... · DAY 1 – Getting the environment built DAY 2 –

WhitePaper

NFV DAY 2 - The Critical Tipping PointWind River Titanium Cloud vs “Vanilla OpenStack” - An Economic & Technical Comparative Analysis

Page 2: IDX WhitePaper - WR Titanium vs Vanilla - NFV DAY 2 - The ...events.windriver.com/wrcd01/wrcm/2019/04/nfv-day2-the-ciritical... · DAY 1 – Getting the environment built DAY 2 –

EXECUTIVE SUMMARY

Virtualization is quickly becoming a critical part of Communication Service Providers (CSP) digital transformation journey. The incorporation of virtualized elements into their core networks and the delivery of virtualized services over those networks are rapidly becoming the competitive differentiator and table stakes for any modern operator.

Network Functions Virtualization (NFV) was introduced as a concept in 2012 and OpenStack became the de-facto operating platform on which NFV was going to be delivered. But CSPs needed to deliver services with the same -or better- carrier grade reliability and performance as they have on their proprietary hardware-based networks. Reservations about OpenStack’s carrier grade capabilities and thus NFV started immediately. At the time IT-grade systems and open source software were simply not believed to meet telco requirements. But the last 7 years have resulted in considerable improvement in the OpenStack core and it has become the de-facto platform for virtualizing CSP networks. The adoption of NFV and Software Defined Networking (SDN) has since accelerated, pushing a number of commercial entities to offer their own OpenStack distributions.

These distributions come in one of 3 forms – Vanilla-opensource, Vanilla-commercialized, Carrier Grade. At their core, they all deliver common cloud functions, provide the same OpenStack APIs and offer a similar cloud operating environment. But only the carrier grade variant has incorpo-rated the necessary capabilities and characteristics to deliver the performance, reliability, supportability and manageability demanded by CSPs.

Most CSPs tried to use a Vanilla flavored version of OpenStack to support NFV and have been learning the hard way that OpenStack is more complicated, has more moving parts, has proven difficult to implement and is actually 10x harder to operate. For many, OpenStack and NFV have entered Gartner’s “Trough of Disillusionment”; simply not delivering on the promise the experts said they would experience.

This is the dilemma. Most CSPs get past the install (day 0), have some success with the cloud deployment and configuration (day 1), but fall flat once they try to actually operate OpenStack (day 2). The economic impact of not properly assessing the day 2 impact of their platform selec-tion costs CSPs time, money and opportunity, both during the deployment but even more critically once the platform is supporting production services.

IDX is a technology and systems integrator that specializes in the CSP market and has been implementing and operating OpenStack and OpenStack/NFV for over 7 years. This paper describes IDX’s experiences through the delivery life cycle of OpenStack. It compares the experi-ences, technical implications and economic impact of utilizing a carrier optimized OpenStack platform vs a commercialized Vanilla distribution and the effect each platform has on day 2 oper-ations.

IDX’s analysis found Wind River’s Titanium Cloud demonstrates significant operational and economic advantages over vanilla OpenStack distributions, particularly for NFV. Through its hardening of open source software and carrier grade optimizations, Titanium Cloud delivers a platform that CSPs can rely on to virtualize their network infrastructure and deliver on the prom-ise of NFV while also cost efficiently deliver on critical day 2 requirements.

NFV DAY 2 - The Critical Tipping PointWind River Titanium Cloud vs “Vanilla OpenStack”

An Economic & Technical Comparative Analysis

2 | White Paper

Page 3: IDX WhitePaper - WR Titanium vs Vanilla - NFV DAY 2 - The ...events.windriver.com/wrcd01/wrcm/2019/04/nfv-day2-the-ciritical... · DAY 1 – Getting the environment built DAY 2 –

3 | White Paper

NFV DAY 2 - The Critical Tipping PointWind River Titanium Cloud vs “Vanilla OpenStack”

An Economic & Technical Comparative Analysis

EXECUTIVE SUMMARY OPENSTACKS ARE NOT EQUAL DAY 0 – The starting point DAY 1 – Getting the environment built DAY 2 – Keeping the lights on Titanium Cloud vs. Vanilla OpenStack - Comparative Analysis

Analysis Process Key Differences – Architecture Key Differences – Day 0 & 1 Key Differences – Day 2

The Economic Impact of Optimized Carrier Grade OpenStack Summary

TABLE OF CONTENTS

OPENSTACKS ARE NOT EQUALOpenStack is a standardized set of components designed to deliver a cloud operating platform. It has been and continues to be developed by a community of open source contributors. In an effort to help companies utilize the platform effectively a number of professional software enterprises chose to take the core OpenStack components and packaged them into commercial distributions. Their objective has been to create professional, integrated and standard-ized infrastructure software that can be consumed successfully by all types of businesses, including CSPs.

Commercial distributions implement a standard or enhanced OpenStack architecture, define/implement specific methods for installing their software distribu-tion, define/implement methods for the deployment & configuration of their OpenStack cloud and meth-ods/procedures for operating/maintaining the cloud once it’s been deployed.

.

Architecturally, distributions vary, some distributions incorporate a separate control plane framework, others integrate the control plane into the core. Some establish an under-cloud for the control plane and separate over-cloud for the data plane while others simply build an integrated cloud that incorporates all features. The nature of OpenStack offers flexibility in how distributions are designed so a number of varia-tions have emerged. Each distribution builds high availability and durability into the control plane but do so using different approaches. Some implementa-tions are straight forward using simple, tried and true architectures while others implement more compli-cated, but often more fragile designs. Many of the commercialized distributions demonstrate the latter characteristics while variants engineered specifically for industrial and carrier operations, such as Wind River Titanium Cloud, deliver distributions designed specifically to support the performance and operat-ing needs of CSPs.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Page 4: IDX WhitePaper - WR Titanium vs Vanilla - NFV DAY 2 - The ...events.windriver.com/wrcd01/wrcm/2019/04/nfv-day2-the-ciritical... · DAY 1 – Getting the environment built DAY 2 –

4 | White Paper

NFV DAY 2 - The Critical Tipping PointWind River Titanium Cloud vs “Vanilla OpenStack”

An Economic & Technical Comparative Analysis

DAY 1 – Getting the environment built

Day 1 starts after the day 0 process has been completed successfully. It can vary substantially based on the OpenStack distribution used and whether the distribution has been developed to service the needs of mainstream businesses or CSPs.

Many of the Vanilla OpenStack distributions have complex and time-consuming methods that involve deployment template development, validation and remediation before an actual deployment can be executed. This process is time consuming and error prone; it can take hours, days and even weeks to complete particularly if the complexity of the templates are high. In contrast, carrier-grade distribu-tions have focused specifically on this area in an attempt to simplify cloud deployment, increase predictability, eliminate errors and shorten any unlike-ly remediation efforts. With these optimized imple-mentations deployment configuration is done by direct synchronous configuration through the cloud management interface and not through asynchro-nously developed and then executed templates. Simplifying the process, accelerating the effort and reducing error.

DAY 0 - The starting pointDay 0 represents the starting point for creating an OpenStack cloud infrastructure. It is the technical entry point for the construction of the cloud frame-work and typically involves the physical installation of the OpenStack software onto a defined set of computers (physical or virtual nodes) that will ultimately become the cloud operating platform.

Vanilla and carrier-grade distributions will use techni-cal architectures (i.e. under-cloud/over-cloud, boot-strap, integrated cloud) that will have an impact on the methods executed during day 0 to install the OpenStack software.

Day 2 represents the day your OpenStack cloud begins its operational life and every day forward until it is decommissioned from service. It represents the living, breathing version of your cloud and requires the appropriate care and feeding activities necessary to keep it alive, growing and healthy. The ongoing viability and durability of a cloud is important in all deployments but it is especially critical when you are underpinning services responsible for critical infra-structure such as communication networks. As communications service providers virtualize the network, deploy NFV on top of OpenStack and become more integrated into the strategic infrastruc-ture of the jurisdictions they support, the reliability and durability demanded of those networks must achieve the highest operational level - 24/7/365 - always on. This dictates that the architecture and operational capabilities of the underlying OpenStack platform must be engineered to meet this require-ment.

For carrier NFV on OpenStack, Day 2 requires indus-trial strength capabilities that enable seamless, online, concurrent - upgrades, updates, patches, remedia-tion capabilities - as well as real time monitoring and platform assurance.

DAY 2 – Keeping the lights on

Page 5: IDX WhitePaper - WR Titanium vs Vanilla - NFV DAY 2 - The ...events.windriver.com/wrcd01/wrcm/2019/04/nfv-day2-the-ciritical... · DAY 1 – Getting the environment built DAY 2 –

NFV DAY 2 - The Critical Tipping PointWind River Titanium Cloud vs “Vanilla OpenStack”

An Economic & Technical Comparative Analysis

5 | White Paper

Key Differences – Architecture

Titanium Cloud vs. Vanilla OpenStack - Comparative Analysis

In conducting the analysis, IDX first identified the key day 0, day 1 and day 2 activities that its teams have been executing over the last 3 years using the Vanilla distribution. Quantitative and qualitative data were extracted from historical records and experiences of the IDX engineering & operations team. The data extracted from the Vanilla platform became the base-line for comparison.

IDX then created a complete Titanium Cloud operat-ing environment functionally comparable to the Vanilla distribution used at the CSP. An equivalent set of day 0, 1, 2 activities were then executed within this environment. Quantitative and qualitative outputs were collected and compared against the Vanilla platform results.

The analysis also compared the general architecture as well as the particular approach each distribution took to the command & control plane.

The design of Vanilla vs Titanium Cloud OpenStack architectures is one of the key differences affecting the durability of each platform. While on the surface the visual representations appear similar on many levels the key variance is in how each distribution implements the command and control layer. The Vanilla platform decouples the cloud into a separate under-cloud and over-cloud layers (see illustration), that results in installation, template development, configuration and operational dependencies that are not required in the Titanium Cloud architecture. For the Vanilla implementation, there is a multi-step day 0 process that starts with the creation of the under-cloud platform, then the installation of the deployment and configuration engine. Only then, can work begin on the development of templates used describe how the over-cloud will be constructed and the ability to move to day 1 activities. Over the last 3 years, IDX’s experience has shown that this process

IDX has worked with commercialized Vanilla OpenStack distributions for many years, and over the last 4 years as a platform for Network Functions Virtualization (NFV) with CSPs. IDX performs day 0, day 1 and day 2 as part of its OpenStack/NFV solution and managed service offerings. This background has provided IDX with a unique perspective when assessing the day 0, day 1 and day 2 profiles of a commercialized Vanilla distribution when compared to a carrier optimized deployment.

As part of its own efforts in assessing best options for delivering OpenStack/NFV for its CSP customers, IDX exercised a quantitative and qualitative analysis between Wind River’s Titanium Cloud platform and a commercialized Vanilla OpenStack platform it has deployed and serviced at a major North American CSP.

The objective of the analysis was to compare and contrast the technical, operational and economic variances between these platforms. The goal was to assess whether an optimized carrier grade distribution offers consequential benefits, both operationally and economically to CSPs.

Quantitative metrics included – procedural steps, man power effort, chronological execution time, operation success/-failure rates, remediation occurrences, remediation steps, and others. Included in these metrics were costs related to human resource time for - standard operations, failure & downtime remediation, upgrade/update/patch activity, repetitive effort due to failure and more. They also included capital investment metrics (HW and other physical asset costs) for each platform.

Qualitative analysis considered system complexity, execution difficulty, failure frequencies, knowledge level require-ments, and other elements affecting quality operations and the ability to achieve high quality delivery.

Analysis Process

Page 6: IDX WhitePaper - WR Titanium vs Vanilla - NFV DAY 2 - The ...events.windriver.com/wrcd01/wrcm/2019/04/nfv-day2-the-ciritical... · DAY 1 – Getting the environment built DAY 2 –

6 | White Paper

NFV DAY 2 - The Critical Tipping PointWind River Titanium Cloud vs “Vanilla OpenStack”

An Economic & Technical Comparative Analysis

Cloud Services

Maintenance Services

Inventory Services

Titanium Controller Node – 0

Controller Cluster Management

Cloud Services

Maintenance Services

Inventory Services

Titanium Controller Node – 1

Controller Cluster Management

Active Passive

VanillaController Node – 0

HAProxy

Pacemaker

KeystoneNeutronCinder

VanillaController Node – 1

KeystoneNeutron

VanillaController Node – 3

HAProxy

Pacemaker

KeystoneNeutronCinder

RabbitMQ Mirrored Queues

Cinder-Vol Cinder-Vol Cinder-Vol

Galera Mul�-Master Replica�on

MariaDB MariaDB MariaDB

Cinder

HAProxy

Pacemaker

... ... ...

VANILLA

can take considerable effort and time, is vulnerable to frequent failures and has demonstrated platform fragility.

In contrast, the Titanium Cloud model has imple-mented simplicity and efficiency in its design. The day 0 and 1 activities are integrated into a straight forward “install from ISO image” procedure which first creates Controller node 1 which in turn is used build and deploy the full cloud. This includes a second control-ler and all required compute/storage nodes; and it is all done using direct GUI/real time interaction.

Figure 1 - Vanilla OpenStack vs. Titanium Cloud

Even the High Availability model differences have proven to have a direct impact on the economic cost of each platform and subsequently on the operation-al resiliency of each environment. That later having a direct and meaningful effect on the management costs related to day 2 operations.

In Vanilla implementations they typically implement an N+1 HA model but unfortunately individual controller services offer varying availability imple-mentations, with some operating as active-active and others active-passive. This creates operational variability and produced stability issues in the environments IDX has deployed. This has resulted in

Comparatively, optimized platforms such as Titanium Cloud, have chosen straight forward, robust and historically validated approaches to HA. In the case of Titanium Cloud this means using the tried and true active/passive node redundancy design. It is simple, battle tested, predictable and has historically demonstrated highly resilient in critical infrastructure. This approach simplifies configuration, management and operations of the cloud’s control plane and better ensures the required uptime demanded by carrier infrastructure.

Figure 2 - Vanilla OpenStack HA Model

Figure 3 - Titanium Cloud HA Model

platform failures and controller service disruptions which have impacted durability and availability of the platform. Two characteristics that negatively impacted deployments in carrier environments and had direct economic impacts.

OVERCLOUD (Deployed Cloud)

Deploy, configure & Manage Nodes

deployed CLOUD

CONTROLLER NODE

CONTROLLER NODE

CONTROLLER NODE

COMPUTE NODE

COMPUTE NODE

STORAGE NODE

STORAGE NODE

STORAGE NODE

Active

Active

Active

CONTROLLER NODE

CONTROLLER NODE

COMPUTE NODE

COMPUTE NODE

STORAGE NODE

STORAGE NODE

STORAGE NODE

Active

Passive

Deploy, configure & Manage Nodes

HA

HA

VA

NIL

LATI

TAN

IUM

CL

OU

D

UNDERCLOUD (Deployment/Configuration Services)

Page 7: IDX WhitePaper - WR Titanium vs Vanilla - NFV DAY 2 - The ...events.windriver.com/wrcd01/wrcm/2019/04/nfv-day2-the-ciritical... · DAY 1 – Getting the environment built DAY 2 –

7 | White Paper

NFV DAY 2 - The Critical Tipping PointWind River Titanium Cloud vs “Vanilla OpenStack”

An Economic & Technical Comparative Analysis

Deployment Time

Vanilla Titanium CloudTime

10 Mins 30 Mins 1 Hour 2 hours 8 Hours 1 Week 2 Weeks

Step

s

0

5

10

15

20

25

30

35

40

45

50

INSTALLATION & DEPLOYMENT EFFORT

Install Deploy

Vanilla Titanium Cloud

Comparing manpower effort and the corresponding time effect of executing the various day 0 and 1 activi-ties has demonstrated a significant gap between Vanilla and optimized platforms. When comparing the initial installation and deployment effort (day 0, 1), the variance in the number of procedural steps in each implementation’s model is dramatic as illustrat-ed in the graph below, in many cases a 3 and 4 fold variance across install and deploy.

Figure 4 - Install/Deployment Effort Comparison

It is highlighted even more by the dramatic difference in actual measured chronological time each platform required to complete the day 0 portion of the rollout.

Day 1 has a similar contrast. IDX deployment experi-ences of a cloud based on Vanilla OpenStack have ranged between days and weeks. The time variance being attributable in part to the different node configurations required – compute vs storage, compute requiring specific CPU, Memory and Disk configurations, the application of optimized network services such as SR-IOV or DPDK and other scenarios. In Vanilla environments there is a considerable amount of time required developing, testing and validating deployment templates and if those templates are incomplete or in error, the time and effort to remediate until a successful deployment is achieved can be significant. There is also the effect of the typical monolithic or ”CLOUD at a Time” approach to deployment used by most Vanilla distributions. This approach succeeds or fails as an entire process, so failed deployments become incredibly expensive both in time and economic cost.

The equivalent day 1 process under Titanium Cloud is a straight forward selection of options in a prescrip-tive user interface. The operator makes choices in an interactive interface or through simple commands defining the type of nodes, the number of nodes and characteristics of each node directly through this interface. Nodes can have the same configuration or configured to the operational requirements of the cloud workload. There is no templating requirement and the day 1 process is managed modularly, a node at a time. So, if for any reason specific deployment actions fail, they only affect the node(s) in question while the remaining deployment proceeds uninter-rupted. Remediation of failures can be managed independently limiting impact to the broader cloud. This model saves time, mitigates operational impact and reduces day 1 time and effort substantially – which saves costs.

Key Differences – Day 0 & 1

Figure 5 - Deployment Execution Time **For the Vanilla platform these metrics reflect averages over 3 years of deployments.

Page 8: IDX WhitePaper - WR Titanium vs Vanilla - NFV DAY 2 - The ...events.windriver.com/wrcd01/wrcm/2019/04/nfv-day2-the-ciritical... · DAY 1 – Getting the environment built DAY 2 –

NFV DAY 2 - The Critical Tipping PointWind River Titanium Cloud vs “Vanilla OpenStack”

An Economic & Technical Comparative Analysis

8 | White Paper

Day 2 presents an even more significant difference when comparing the implementations. Vanilla Open-Stack variants generally utilize monolithic methodolo-gies when deploying, supporting and operating a cloud. Across many of its functions, the concept of a “CLOUD at a Time” (CAAT) is applied. This methodol-ogy effects how the cloud is constructed, deployed, patched, upgraded and how it supports the scale up and scale down of resources and nodes. This meth-odology presents challenges in terms of:

In comparison, Titanium Cloud utilizes a modular methodology for managing its cloud, nodes and resources. It applies a “NODE at a Time” (NAAT) approach where each component/node is managed as a discrete element so that any issues or problems with a component/node are restricted to that portion of the cloud and can be resolved independent of all other components

Key Differences – Day 2

Deployment performance – any add, change or delete operation requires a re-validation of the entire cloud – can take hours to complete and requires the cloud configuration to be 100% operationally valid for the modification to be successfully applied.

Failure effect – if an add, change or delete oper-ation fails, even on a single element, the entire cloud can be deemed invalid and ultimately require the entire process to be corrected and re-executed. Cycle time to recognize a failure and the re-execution of the process can be tedious, time consuming and costly.

Template driven – each unique node configura-tion requires the development of a specific/cus-tomized template. The process to develop and test a template can be an arduous process. If the template contains errors the deployment of the specific node type will fail and that failure will in turn domino to a cloud validation failure. This is a tedious, time consuming, inefficient process.

Vanilla OpenStack distributions typically do NOT implement a state machine. This allows for potential fidelity issues between the control plane (under-cloud) and the data plane (over-cloud). Presenting an inconsistent opera-tional state between the two (i.e. Control plane thinks a compute node is in maintenance mode, data plane thinks the compute node is opera-tional and continues to schedule work on it) and putting the entire cloud at risk.

Deployment performance based on the NAAT model is quick – Add, change and delete opera-tions generally take minutes and are completed using wizard-based execution or 1 or 2 commands executed via command line.

Operational attributes can be modified on a node by node basis and they affect only the node(s) in question. Errors that may result from such changes only affect the nodes in question and are resolvable at the node level without affecting the operational state of the cloud or any of the other nodes or cloud components.

Each node can have a very distinct network setup, with interfaces being enabled with specif-ic attributes (SR-IOV, DPDK, etc) and can also be associated with a discrete set of networks.

Cloud and Node configuration is facilitated through system commands and scripts, not through templates.

Titanium Cloud offers operational safety of its framework using a State Machine avoiding the potential fidelity issues found in Vanilla deploy-ments.

Nodes need to be placed into specific states in order for management operations to take place (i.e. In order to remove a node from the cloud the node needs to be placed in an LOCKED state. In order to be in an LOCKED state, all workload must be evacuated from the node.) This prevents inadvertent management opera-tions from affecting tenant activity or causing cloud corruption.

Page 9: IDX WhitePaper - WR Titanium vs Vanilla - NFV DAY 2 - The ...events.windriver.com/wrcd01/wrcm/2019/04/nfv-day2-the-ciritical... · DAY 1 – Getting the environment built DAY 2 –

NFV DAY 2 - The Critical Tipping PointWind River Titanium Cloud vs “Vanilla OpenStack”

An Economic & Technical Comparative Analysis

9 | White Paper

Version/Release Patch Bundles

Individual Patches/Packages

Cer�fied/Tested Patches

Patch Rollback**

Automated Patch Rollback

Tenant Workload Migra�onAutomatedManual

Patch Applica�onAutomatedManual

VANILLAPATCHING

Opera�onal Model Monolithic – CLOUD at a Time Modular – NODE at a Time

Control Plane Dependency Undercloud (Deployment engine) + Controllers Singular Controller

Configura�on Management Complex, �me consuming & error prone Atomic, efficient, simple

Cloud Customiza�on Templates System Commands & scripts

Node Configura�on Monolithic Modular

Configura�on Method Templates API calls

Node - Add, remove, reconfigure process Hours Minutes

Built in Control Plane backup No, requires custom scrip�ng Yes

Upgrade Releases No Yes

In Place Upgrades No Yes

Official/Tested BUNDLED Patch Releases No Yes

Integrated Alarming/Assurance Services No Yes

Automated Tenant Workload Live Migra�on No, manual Yes

VANILLA

A critical difference between carrier grade and Vanilla platforms is the ability to support real time patching and upgrades. These are mandatory characteristics of digital infrastructure platforms designed to support CSPs. Non-carrier grade variants generally do not support these requirements.

Vanilla implementations, even ones backed by commercial vendors, have not applied the structural rigor to support effective patching. Many have retained the free spirit nature of the open source paradigm and do not formalize patch bundles or even certifying individual patches. Much of this is left to the consumers discretion to figure out, which leaves no prescriptive or predicable outcome. This creates considerable risk because without certifica-tion, patches have not been tested and validated against a known state. This introduces instability, unpredictable results and often operational failures. This is particularly concerning given that the CAAT methodology applies patches with the same scope it deploys nodes. This means that a failure during a patch application can result in full cloud reset event. This can have direct operational effects including unplanned downtime of the entire cloud.

For carrier grade platforms a real time, non-disruptive production level patching capability is a major differ-entiator. One that carrier grade versions invest heavily in. Such OpenStack implementations incorporate prescriptive, defined and documented methods for testing, certifying and validating patch bundles. Like commercial operating system vendors, Wind River Titanium Cloud demonstrated the requisite checks, balances and automation effects necessary to support real time, non-disruptive patch application and rollback into live, production infrastructure. This is a mandatory capability for CSP digital infrastructure.

Titanium Cloud ticks all the boxes required to support critical productions infrastructure. Vanilla OpenStack has proven limited and heavily reliant on operators to perform extraordinary technical acts to facilitate patch application while still trying to maintain live operations of their cloud.

DAY 2 Capabilities

Figure 6 - Install/Deployment Effort Comparison

Figure 7 - Patching Comparison

Page 10: IDX WhitePaper - WR Titanium vs Vanilla - NFV DAY 2 - The ...events.windriver.com/wrcd01/wrcm/2019/04/nfv-day2-the-ciritical... · DAY 1 – Getting the environment built DAY 2 –

NFV DAY 2 - The Critical Tipping PointWind River Titanium Cloud vs “Vanilla OpenStack”

An Economic & Technical Comparative Analysis

10 | White Paper

Require Equivalent non-prod for valida�ng upgrade

Non disrup�ve release upgrades

Official Upgrade version/release

Upgrades rebuild Cloud

VANILLA

UPGRADE

Similar differences exist around cloud upgrades. Vanilla environments have no effective methods or procedures for supporting full cloud software upgrades. For many it becomes a one-shot deal so in order to de-risk the process and increase the odds of upgrade success they must test the upgrade on a clone of their production environment. The means they need to have a second cloud, equivalent to the one to be upgraded; to test the upgrade, correct any issues and validate success before they would ever execute against the real environment. This is extreme-ly costly on multiple fronts and still provides no certainty of success. Additionally, the vast majority of Vanilla OpenStack distributions do not provide auto-mated tenant work load failover. This means that during the upgrade process, operators of the cloud must manually migrate workload from nodes that will be upgraded and then manually migrate them back once the upgrade completes, further complicating the process and increasing the possibility of failure.

In contrast, much like patching, carrier grade Open-Stacks have gone to great lengths to test, validate and provide rollback vehicles for upgrading produc-tion clouds. They have also engineered into their platforms automated workload relocation so as nodes in the cloud are run through the upgrade, the workload they originally housed is automatically migrated to a different node and the repatriated once the upgrade has completed. Again, Titanium Cloud checks all the required boxes.

Figure 8 - Upgrade Characteristics Comparison

One of the most compelling contrasts between Vanil-la OpenStack vs one optimized for carriers is the economic impact. As part of the basis for this white-paper IDX performed a business case calculation to assess the economic costs in terms of capital and operational expenditures. The calculation was based on a 1000 node implementation, deployed and oper-ated as 10 clouds each with 100 nodes, managed & operated for a 5-year term.

For the business case IDX normalized the cost metrics related to the software component to account for the differences in how the commercial vendor charged for Vanilla OpenStack vs how Wind River charged for its software. The Vanilla OpenStack is delivered under a 100% subscription support model. At the time of IDX economic analysis Wind River charged an initial upfront licensing and royalty fee, which was catego-rized as a Capex expenditure in the analysis, and then an annual support fee. Since this analysis, Wind River has updated its Titanium Cloud economic model to support a consumption framework that charges an annual fee per node that includes license and support. Additional nodes can be added to a Titani-um Cloud anytime, as required.

Since the analysis was done using the previous Wind River Titanium Cloud cost model, the upfront fees were amortized over the 5-year term to normalize the figures for comparison. Hardware costs were not included in the business case because they were comparable between the two environments and because it represented a negligible difference and did not represent a material change to the conclusions. Build costs are reflected as a one-time charge and presented as Capex expenditures. Operations and Patching are ongoing costs and are presented as Opex expenditures.

The result of the business case analysis showed nomi-nal cost variance on Capex between the two environ-ments but the Opex difference was dramatic. Titani-um Cloud Opex was 65% less than the Vanilla platform over five years.

The Economic Impact of Optimized Carrier Grade OpenStack

Page 11: IDX WhitePaper - WR Titanium vs Vanilla - NFV DAY 2 - The ...events.windriver.com/wrcd01/wrcm/2019/04/nfv-day2-the-ciritical... · DAY 1 – Getting the environment built DAY 2 –

NFV DAY 2 - The Critical Tipping PointWind River Titanium Cloud vs “Vanilla OpenStack”

An Economic & Technical Comparative Analysis

IDX is an industry leading innovation first technology solutions and integration provider. We deliver sophisticated, elegant and comprehensive infrastructure solutions to the Telecom, Service Provider and Large Enterprise markets. IDX Labs is the R & D division of IDX providing cutting edge research and development services for the CSP market.© 2019 Interdynamix Systems. IDX, IDX Labs. IDX|IDX Labs logos is a trademark of Interdynamix Systems. Wind River, Titanium Cloud are registered trademarks of Wind River Systems, Inc.

www.interdynamix.com

The vast majority of the savings is directly attributable to the reduced operating costs from more effective and efficient day 2 operations of the Titanium Cloud platform.

IDX also evaluated the costs of onboarding technical resource capacity. This is the skill and knowledge development necessary to execute day 0, 1, and 2 activities competently within each environment. The analysis demonstrated it to be considerably more challenging and costlier under Vanilla distributions then compared to prescriptive implementa-tions such as Titanium Cloud. As part of the analysis IDX considered the time effort required for a technical resource to meet minimum competency levels in both the Vanilla and Titanium Cloud deployments. The difference here was considerable – for Vanilla OpenStack the average ramp up time was 3 months on average. For Titanium Cloud the average was 3 weeks.

SUMMARY

Figure 9 - 1000 Node - 5 Year Cost Comparison

OpenStack has become the de-facto foundation for NFV, but the open source base on its own is proving not to be enough to deliver the promise of agility and cost savings. The analysis suggests that an optimized platform such as Titanium Cloud, based on an OpenStack distribution focused on reliability, durability and specifically carrier grade day 2 operations offers a better option for CSPs then commercialized Vanilla OpenStack both technically and economically.

The Titanium Cloud platform runs virtual functions with carrier grade reliability and enhances the performance of VNFs to deliver a carrier level experience. Its straight forward architecture, tried and true high availability design and its day 2 optimized operating framework directly translate into real business benefits for CSPs, including described capital costs, lower operating costs and the flexibility to deploy services quickly and scale services dynamically. Through hard-ened open source software, carrier grade optimizations, Titanium Cloud delivers a platform that CSPs can rely on to virtualize their network infrastructure and deliver on the promise of NFV.

Vanilla Titanium CloudCapexOpex

CapexOpex

$ C

ost

0

100,000

200,000

400,000

600,000

800,000

1,000,000

2,000,000

3,000,000

4,000,000

5,000,000

Software LicenseSubscription &

Support License

7,500,000

10,000,000

Build – PODs Operate – PODs Patching – PODs5 Year Term

20,000,000

5 Year Total Cost