a word on “automation through ml for openstack nfv ”

41
Automation + Machine Learning = Hands Free NFV A Word On “Automation through ML for Openstack NFV ” PRAKASH RAMCHANDRAN MICHAEL TIEN JAYANTHI A GOKHALE 01.11.2017

Upload: khangminh22

Post on 03-Feb-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Automation + Machine Learning =

Hands Free NFVA Word On “Automation through ML for Openstack NFV ”

PRAKASH RAMCHANDRAN

MICHAEL TIEN

JAYANTHI A GOKHALE

01.11.2017

NFV Automation Challenges?Automation of NFV Service & NFV Infrastructure

What’s new in Standards ?ZSM the new NFV Zero Touch Evolving Standards

What can ML Bring to automation ?Manual to ML driven Intelligent Automation

Practical Viewpoint from Dell Labs?Redfish

Practical Viewpoint from Dell Labs?Swordfish

What’s next?The Industry in moving towards E2E Orchestration and

Management

Agenda

@OpenStack

NFV Automation ChallengesDell

openstack openstack OpenStackFoundation

NFV Adoption Challenge

• Lack of end-to-end automation

• Lack interoperability NFVI/VNF and VNF/VNF

• Unpredictable datacenter planning

• Lack of service awareness

• Not easy consumable API

• Limited programmability service-to-service

• Not zero touch-free

• Requires various resources to maintain (IT,

DevOps, Operations)

• Requires multiple POCs

Today’s Datacenter Challenge

Automation - the key to

unlocking future

efficiencies

@OpenStack

What’s new in StandardsDell

openstack openstack OpenStackFoundation

ETSI

ISG

ZSM

Zero Touch Networking Service and Management

• M2M Communications, Provisioning, and

Management

• Dynamic service chain data mapping

• Dynamic policy enhancement and

enforcement

• Continuous Data Collector and Analytic

• Auto reactive + proactive self-healing

• Real-time datacenter capacity scheduler

• Autonomous end-to-end orchestration

lifecycle management

• Intelligent service-state awareness

optimization

• Smart API

Zero touch NFV provides true next generation

NFVaaS or VNFaaS

What is Machine Learning?Machine learning is a way for infrastructure or platforms to understand and progressively learn from input data to validate models to understand the behavior of system to attain desirable outcomes. (e.g.. Overcoming FCAPS in Telco terms)

Why Does Automation need ML?

In our case Anomaly detection of systems, networking and network functions is the goal based on FCAPS.

This can be done by supervised or unsupervised or dynamic learning

Basic requirements for this is Closed loop Control mechanism.

Self healing within Layers (Local Policy) & ML for Cross-layer (Global Policy)

What’s new in Standards / Opensource for NFV Stack

Connected

Vehicle

Application

Service /

Application

OSS/BSS TR 188 004

Open Policy Agent

Network Slicing NS NFVO/SDNO ONAP, SDNC

Container

workloadVNF VNFM EMS, VNF SDK

Kata,CNI,NVME VM, VN,VS VIM Containerized OS

@OpenStack

What can ML Bring to AutomationGokhale Jayanthi

openstack openstack OpenStackFoundation

Traditional Manual Deploy Cycle

The changing landscape of Infrastructure

• Bare metal

• Hypervisor

• VM – Booting, Secure Booting, Booting

from Volume

• Container CRI & CNI

• Light weight VM – Kata - Intel

• VM in Container – Unnamed yet - Redhat

Why NFV Automation needs innovation?• NFV and SDN integrated clouds are growing from centralized to geographically scattered and

massively distributed clouds

• Thus Orchestration, Management and Maintenance has become more challenging and requires

more attention to distributed , hybrid clouds and need of hour is to accelerated service velocity.

• Automation is a prime solution to Provision and Maintain complete environment.

• With a mix of Intelligent Infrastructure & Machine Learning it is possible to target dynamic cloud

management.

• We focus here on Service & Infrastructure Management automation.

• We share our experience dealing with compute (Redfish), storage(swordfish) and Networking (SDN-

WAN) and how we add closed loops and benefit derived form Data Collection and Analysis with ML.

• Leads to Hands Free or Zero Touch NFV.

Some Statistics

• 80% of outages impacting mission critical applications are caused by people

and process issues

• 50% of these are caused by change, configuration, handoff, release

integration, re deployable application services etc

•Though the number of downtime hours is reduced, cost of downtime is now

50X

•Automate the Deployment Process, Intelligently

The Learning Input Points

Intelligent Infrastructure Deployment• Identify smart ways to create, manage and orchestrate federation environments.

• ML can be utilized to train AI systems to recognize demand and deployment patterns in the

context of various Service Level Objective metrics, called dimensions, like

• Number of VM instances

• Network demand

• Migration metrics

• Latency measures

• SLA parameters of throughput

• Number and type of SLA violations

• Cluster sizes

• AZs

• ML can be used to devise optimized containers, container sizing, planning of microservices

• Results in true Agile Infrastructure provisioning.

Intelligent Infrastructure

• Service Providers can easily and efficiently accommodate the demands of

mixed workloads from a single platform.

• Leveraging the QoS capabilities, policies can be provisioned and enforced to

isolate each workload while running simultaneously within a shared

infrastructure.

• ML needs vast amounts of real time performance data generated by a QOS

monitor and network telemetry data, providing early recognition of

developing performance issues, before they negatively impact human

experience. The ML provides information to fine tune / redeploy the

infrastructure to optimize the QOS metrics.

Automated Deployment Process

• Static Deployment• Templated. Flavours can be used to select based on requirements.

• Dynamic• Dynamically determine deployment context and deployment parameters.

Define the deployment plan. Once defined, it remains static.

• Smart / Intelligent Deployment• ML and AI driven deployment to optimise the Service Level Objectives. The

deployment plan is predicted, evaluated, customized and optimized.

• TOSCA document used to describe the services and applications to be deployed on the cloud the deployment description

Advantages

• Eliminate manual intervention out of the deployment process (application and infrastructure)

• Reduce complexity. Can now consider major and minor driving factors to strategise deployment plan

• Global and local optimization is possible

Automating the process

TOSCA

• Topology & Orchestration Specification of Cloud Applications

• Standardised language to describe• Detailing of the application & infrastructure in a portable manner• Defines the structure and composition of applications and their infrastructure• Defines the relationships• Specifies state and behaviour (deploy, shutdown, restart etc)• Relate this with the cloud infrastructure management policies (and associated

SLAs)

• Model that specifies applications, virtual and physical infrastructure.

• Stores the info in a ‘service template’ in yaml which is processed at deploy-time and perform virtual & physical deployment

Application Topology

• Defined at 3 levels• Infrastructure (cloud and DC objects)

• Platform / Middleware (App Containers)

• Application modules and their configuration

Service Orchestration

• Should address to

• Cloud Infra Orchestration

• Container Orchestration

• Network Orchestration

• Application Orchestration (including Legacy Applications)

TOSCA supported ML

Models

ML METEOS

Candidate Model Params

Gather Metrics

ConductorModify

Template

Re deployRevise &

Select

Build & Update models

Ceilometer Logs

Training System

• Pruned Decision Tree

• Neural Network

• Hyper parameter optimization using cross validation (Random Forests)

Metrics, a few examples

• Number of instances

• Instance size

• Demand of Load

• Inter arrival request time

• Delay time / Latency to service a request

• Workload latency

• Throughput time for service

• Telemetry data

• Network demand

• Number of SLA violations

• Number of containers

• Cost of number of replication sets

Technology Stack

• Apache Kafka

• WEKA

• Scala

• Python & Java languages

• Docker

• Kubernetes

• Kata

@OpenStack

Practical Viewpoint from Dell LabsMichael Tien

openstack openstack OpenStackFoundation

Redfish – the next-generation systems management standard for an evolving IT environment

• DMTF Scalable Platform Management Forum has created an

open industry standard specification and schema for simple,

modern, and secure management of scalable platform

hardware

• A secure, multi-node, RESTful management interface built upon

HTTPS in JSON format based upon OData v4

• Schema-based but human-readable; usable by client

applications and browser-based GUIs

• Covers key use cases and customer requirements

What Redfish can do today?

Provides a common interface across platforms

and vendors supporting

▪ Reset, reboot, and power control servers

▪ Inventory server hardware and firmware

versions

▪ Monitor health status of server

▪ Access system logs

▪ Alert on server health status changes

Delivering the benefits of Redfish -14G iDRAC9 with Lifecycle Controller

New for 14G iDRAC9 RESTful API with Redfish

• iDRAC RESTful API enables modern,

secure, scalable management automation

• Conformant with Redfish 1.2

o BIOS configuration

o Secure boot configuration

o Firmware inventory and update

• Enhanced iDRAC RESTful API extensions

o Profile-driven server configuration

and update

o iDRAC configuration

Modern tools for Redfish management automation

import requests

import json

system = requests.get('https://<iDRAC

IP>/redfish/v1/Systems/System.Embedded.1',verify=False,auth=('root',’ passwd'))

storage = requests.get('https://<iDRAC

IP>/redfish/v1/Systems/System.Embedded.1/Storage/Controllers/RAID.Int egrated.1-

1',verify=False,auth=('root’,’curry'))

systemData = system.json()

storageData = storage.json()

print "Model: {}".format(systemData[u'Model'])

print "Manufacturer: {}".format(systemData[u'Manufacturer'])

print "Service tag {}".format(systemData[u'SKU'])

print "Serial number: {}".format(systemData[u'SerialNumber'])

print "Hostname: {}".format(systemData[u'HostName'])

print "Power state: {}".format(systemData[u'PowerState'])

print "Asset tag: {}".format(systemData[u'AssetTag'])

print "Memory size: {}".format(systemData[u'MemorySummary'][u'TotalSystemMemoryGiB'])

print "CPU type: {}".format(systemData[u'ProcessorSummary'][u'Model'])

print "Number of CPUs: {}".format(systemData[u'ProcessorSummary'][u'Count'])

print "System status: {}".format(systemData[u'Status'][u'Health'])

print "RAID health: {}".format(storageData[u'Status'][u'Health'])

Server inventory with Python scripting Server storage health status via Postman plug-in

• IT developers are seekingo Fast, reliable, and repeatable outcomeso On-demand runtime environment creationo Consistent staging and production

environment

• Emerging solutions utilize orchestration tools and RESTful programmingo “Infrastructure as a Code”o Complete version control covering code,

configuration, and datao Aligns development and operations

• Overriding goalo “desired state” management for deployment,

update, and configuration drift control iDRAC

New for 14G iDRAC9 RESTful API with Redfish

• Server Configuration Profiles (SCP) enable RESTful configuration of PowerEdge BIOS, iDRAC/LC, PERC controllers, NICs, and HBAs

• API provides for export, preview, and import operations to replicate existing and create custom server configurations

• SCP files can be stored on CIFS, NFS, or HTTP/S network shares or streamed within API

• SCP XML and JSON file formats

• Firmware update from network-based repository

• Zero-touch Auto Configuration via CIFS, NFS or HTTP/S network share

New for 14G iDRAC9 RESTful API with Redfish

New for

14G

What’s next for Redfish?

Dell EMC and the DMTF driving development of Redfish with

significant additions planned

“Swordfish” external storage standards

Network switch API standards

Environmental APIs for power and HVAC

Interoperability with Open Compute Project, OpenStack, and

orchestration solutions

Expanded automation developer tooling

SNIAAdding to Redfish Resource MapBlock storage

Provisioning with class of service control

Volume Mapping and Masking Replication Capacity and health metrics

File system storage Adds File System and File Share Leverages all other concepts –

provisioning with class of service, replication, … Additional content

Object drive storage

Profiles define sets of required functionality to support: Basic Swordfish support - Hosted service configuration- Integrated service configuration Add-on functionality: - Local replication - Remote replication Certification Conformance Requirements (in Plans) EnergyStar Requirements: Orthogonal to functionality profiles – Energy and power metrics – Controls for on-demand instrumentation

SNIAAdding Storage to Redfish :Swordfish(Hosted Service Configuration)Block storage

Provisioning with class of service control

Volume Mapping and Masking Replication Capacity and health metrics

File system storage Adds File System and File ShareLeverages all other concepts –

provisioning with class of service, replication, … Additional content

Object drive storage

SNIAAdding Storage to Redfish :Swordfish(Integrated Service Configuration)Block storage

Provisioning with class of servicecontrol

Volume Mapping and Masking Replication Capacity and health metrics

File system storage Adds File System and File ShareLeverages all other concepts –

provisioning with class of service, replication, … Additional content

Object drive storage

@OpenStack

THANKS.Questions?

openstack openstack OpenStackFoundation