a word on “automation through ml for openstack nfv ”
TRANSCRIPT
Automation + Machine Learning =
Hands Free NFVA Word On “Automation through ML for Openstack NFV ”
PRAKASH RAMCHANDRAN
MICHAEL TIEN
JAYANTHI A GOKHALE
01.11.2017
NFV Automation Challenges?Automation of NFV Service & NFV Infrastructure
What’s new in Standards ?ZSM the new NFV Zero Touch Evolving Standards
What can ML Bring to automation ?Manual to ML driven Intelligent Automation
Practical Viewpoint from Dell Labs?Redfish
Practical Viewpoint from Dell Labs?Swordfish
What’s next?The Industry in moving towards E2E Orchestration and
Management
Agenda
NFV Adoption Challenge
• Lack of end-to-end automation
• Lack interoperability NFVI/VNF and VNF/VNF
• Unpredictable datacenter planning
• Lack of service awareness
• Not easy consumable API
• Limited programmability service-to-service
• Not zero touch-free
• Requires various resources to maintain (IT,
DevOps, Operations)
• Requires multiple POCs
Zero Touch Networking Service and Management
• M2M Communications, Provisioning, and
Management
• Dynamic service chain data mapping
• Dynamic policy enhancement and
enforcement
• Continuous Data Collector and Analytic
• Auto reactive + proactive self-healing
• Real-time datacenter capacity scheduler
• Autonomous end-to-end orchestration
lifecycle management
• Intelligent service-state awareness
optimization
• Smart API
Zero touch NFV provides true next generation
NFVaaS or VNFaaS
What is Machine Learning?Machine learning is a way for infrastructure or platforms to understand and progressively learn from input data to validate models to understand the behavior of system to attain desirable outcomes. (e.g.. Overcoming FCAPS in Telco terms)
Why Does Automation need ML?
In our case Anomaly detection of systems, networking and network functions is the goal based on FCAPS.
This can be done by supervised or unsupervised or dynamic learning
Basic requirements for this is Closed loop Control mechanism.
Self healing within Layers (Local Policy) & ML for Cross-layer (Global Policy)
What’s new in Standards / Opensource for NFV Stack
Connected
Vehicle
Application
Service /
Application
OSS/BSS TR 188 004
Open Policy Agent
Network Slicing NS NFVO/SDNO ONAP, SDNC
Container
workloadVNF VNFM EMS, VNF SDK
Kata,CNI,NVME VM, VN,VS VIM Containerized OS
Traditional Manual Deploy Cycle
The changing landscape of Infrastructure
• Bare metal
• Hypervisor
• VM – Booting, Secure Booting, Booting
from Volume
• Container CRI & CNI
• Light weight VM – Kata - Intel
• VM in Container – Unnamed yet - Redhat
Why NFV Automation needs innovation?• NFV and SDN integrated clouds are growing from centralized to geographically scattered and
massively distributed clouds
• Thus Orchestration, Management and Maintenance has become more challenging and requires
more attention to distributed , hybrid clouds and need of hour is to accelerated service velocity.
• Automation is a prime solution to Provision and Maintain complete environment.
• With a mix of Intelligent Infrastructure & Machine Learning it is possible to target dynamic cloud
management.
• We focus here on Service & Infrastructure Management automation.
• We share our experience dealing with compute (Redfish), storage(swordfish) and Networking (SDN-
WAN) and how we add closed loops and benefit derived form Data Collection and Analysis with ML.
• Leads to Hands Free or Zero Touch NFV.
Some Statistics
• 80% of outages impacting mission critical applications are caused by people
and process issues
• 50% of these are caused by change, configuration, handoff, release
integration, re deployable application services etc
•Though the number of downtime hours is reduced, cost of downtime is now
50X
•Automate the Deployment Process, Intelligently
Intelligent Infrastructure Deployment• Identify smart ways to create, manage and orchestrate federation environments.
• ML can be utilized to train AI systems to recognize demand and deployment patterns in the
context of various Service Level Objective metrics, called dimensions, like
• Number of VM instances
• Network demand
• Migration metrics
• Latency measures
• SLA parameters of throughput
• Number and type of SLA violations
• Cluster sizes
• AZs
• ML can be used to devise optimized containers, container sizing, planning of microservices
• Results in true Agile Infrastructure provisioning.
Intelligent Infrastructure
• Service Providers can easily and efficiently accommodate the demands of
mixed workloads from a single platform.
• Leveraging the QoS capabilities, policies can be provisioned and enforced to
isolate each workload while running simultaneously within a shared
infrastructure.
• ML needs vast amounts of real time performance data generated by a QOS
monitor and network telemetry data, providing early recognition of
developing performance issues, before they negatively impact human
experience. The ML provides information to fine tune / redeploy the
infrastructure to optimize the QOS metrics.
Automated Deployment Process
• Static Deployment• Templated. Flavours can be used to select based on requirements.
• Dynamic• Dynamically determine deployment context and deployment parameters.
Define the deployment plan. Once defined, it remains static.
• Smart / Intelligent Deployment• ML and AI driven deployment to optimise the Service Level Objectives. The
deployment plan is predicted, evaluated, customized and optimized.
• TOSCA document used to describe the services and applications to be deployed on the cloud the deployment description
Advantages
• Eliminate manual intervention out of the deployment process (application and infrastructure)
• Reduce complexity. Can now consider major and minor driving factors to strategise deployment plan
• Global and local optimization is possible
TOSCA
• Topology & Orchestration Specification of Cloud Applications
• Standardised language to describe• Detailing of the application & infrastructure in a portable manner• Defines the structure and composition of applications and their infrastructure• Defines the relationships• Specifies state and behaviour (deploy, shutdown, restart etc)• Relate this with the cloud infrastructure management policies (and associated
SLAs)
• Model that specifies applications, virtual and physical infrastructure.
• Stores the info in a ‘service template’ in yaml which is processed at deploy-time and perform virtual & physical deployment
Application Topology
• Defined at 3 levels• Infrastructure (cloud and DC objects)
• Platform / Middleware (App Containers)
• Application modules and their configuration
Service Orchestration
• Should address to
• Cloud Infra Orchestration
• Container Orchestration
• Network Orchestration
• Application Orchestration (including Legacy Applications)
TOSCA supported ML
Models
ML METEOS
Candidate Model Params
Gather Metrics
ConductorModify
Template
Re deployRevise &
Select
Build & Update models
Ceilometer Logs
Training System
• Pruned Decision Tree
• Neural Network
• Hyper parameter optimization using cross validation (Random Forests)
Metrics, a few examples
• Number of instances
• Instance size
• Demand of Load
• Inter arrival request time
• Delay time / Latency to service a request
• Workload latency
• Throughput time for service
• Telemetry data
• Network demand
• Number of SLA violations
• Number of containers
• Cost of number of replication sets
Technology Stack
• Apache Kafka
• WEKA
• Scala
• Python & Java languages
• Docker
• Kubernetes
• Kata
Redfish – the next-generation systems management standard for an evolving IT environment
• DMTF Scalable Platform Management Forum has created an
open industry standard specification and schema for simple,
modern, and secure management of scalable platform
hardware
• A secure, multi-node, RESTful management interface built upon
HTTPS in JSON format based upon OData v4
• Schema-based but human-readable; usable by client
applications and browser-based GUIs
• Covers key use cases and customer requirements
What Redfish can do today?
Provides a common interface across platforms
and vendors supporting
▪ Reset, reboot, and power control servers
▪ Inventory server hardware and firmware
versions
▪ Monitor health status of server
▪ Access system logs
▪ Alert on server health status changes
New for 14G iDRAC9 RESTful API with Redfish
• iDRAC RESTful API enables modern,
secure, scalable management automation
• Conformant with Redfish 1.2
o BIOS configuration
o Secure boot configuration
o Firmware inventory and update
• Enhanced iDRAC RESTful API extensions
o Profile-driven server configuration
and update
o iDRAC configuration
Modern tools for Redfish management automation
import requests
import json
system = requests.get('https://<iDRAC
IP>/redfish/v1/Systems/System.Embedded.1',verify=False,auth=('root',’ passwd'))
storage = requests.get('https://<iDRAC
IP>/redfish/v1/Systems/System.Embedded.1/Storage/Controllers/RAID.Int egrated.1-
1',verify=False,auth=('root’,’curry'))
systemData = system.json()
storageData = storage.json()
print "Model: {}".format(systemData[u'Model'])
print "Manufacturer: {}".format(systemData[u'Manufacturer'])
print "Service tag {}".format(systemData[u'SKU'])
print "Serial number: {}".format(systemData[u'SerialNumber'])
print "Hostname: {}".format(systemData[u'HostName'])
print "Power state: {}".format(systemData[u'PowerState'])
print "Asset tag: {}".format(systemData[u'AssetTag'])
print "Memory size: {}".format(systemData[u'MemorySummary'][u'TotalSystemMemoryGiB'])
print "CPU type: {}".format(systemData[u'ProcessorSummary'][u'Model'])
print "Number of CPUs: {}".format(systemData[u'ProcessorSummary'][u'Count'])
print "System status: {}".format(systemData[u'Status'][u'Health'])
print "RAID health: {}".format(storageData[u'Status'][u'Health'])
Server inventory with Python scripting Server storage health status via Postman plug-in
• IT developers are seekingo Fast, reliable, and repeatable outcomeso On-demand runtime environment creationo Consistent staging and production
environment
• Emerging solutions utilize orchestration tools and RESTful programmingo “Infrastructure as a Code”o Complete version control covering code,
configuration, and datao Aligns development and operations
• Overriding goalo “desired state” management for deployment,
update, and configuration drift control iDRAC
New for 14G iDRAC9 RESTful API with Redfish
• Server Configuration Profiles (SCP) enable RESTful configuration of PowerEdge BIOS, iDRAC/LC, PERC controllers, NICs, and HBAs
• API provides for export, preview, and import operations to replicate existing and create custom server configurations
• SCP files can be stored on CIFS, NFS, or HTTP/S network shares or streamed within API
• SCP XML and JSON file formats
• Firmware update from network-based repository
• Zero-touch Auto Configuration via CIFS, NFS or HTTP/S network share
New for 14G iDRAC9 RESTful API with Redfish
New for
14G
What’s next for Redfish?
Dell EMC and the DMTF driving development of Redfish with
significant additions planned
“Swordfish” external storage standards
Network switch API standards
Environmental APIs for power and HVAC
Interoperability with Open Compute Project, OpenStack, and
orchestration solutions
Expanded automation developer tooling
SNIAAdding to Redfish Resource MapBlock storage
Provisioning with class of service control
Volume Mapping and Masking Replication Capacity and health metrics
File system storage Adds File System and File Share Leverages all other concepts –
provisioning with class of service, replication, … Additional content
Object drive storage
Profiles define sets of required functionality to support: Basic Swordfish support - Hosted service configuration- Integrated service configuration Add-on functionality: - Local replication - Remote replication Certification Conformance Requirements (in Plans) EnergyStar Requirements: Orthogonal to functionality profiles – Energy and power metrics – Controls for on-demand instrumentation
SNIAAdding Storage to Redfish :Swordfish(Hosted Service Configuration)Block storage
Provisioning with class of service control
Volume Mapping and Masking Replication Capacity and health metrics
File system storage Adds File System and File ShareLeverages all other concepts –
provisioning with class of service, replication, … Additional content
Object drive storage
SNIAAdding Storage to Redfish :Swordfish(Integrated Service Configuration)Block storage
Provisioning with class of servicecontrol
Volume Mapping and Masking Replication Capacity and health metrics
File system storage Adds File System and File ShareLeverages all other concepts –
provisioning with class of service, replication, … Additional content
Object drive storage