alok prakash intel corporation - 01.org | intel open source … · 2019-06-27 · alok prakash...

22
Running Pets on OpenStack Alok Prakash Intel Corporation

Upload: others

Post on 04-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Alok Prakash Intel Corporation - 01.org | Intel Open Source … · 2019-06-27 · Alok Prakash Intel Corporation. 2 Challenges in running PET workloads in the Cloud Trust Performance

Running Pets on OpenStack

Alok Prakash

Intel Corporation

Page 2: Alok Prakash Intel Corporation - 01.org | Intel Open Source … · 2019-06-27 · Alok Prakash Intel Corporation. 2 Challenges in running PET workloads in the Cloud Trust Performance

2

Challenges in running PET workloads in the Cloud

Trust

Performance

Assurance

Public

Application Workloads & Data

Private

Physical and virtualized machine management

CPU IOPS Memory Memory b/w CPU cache Instruction set

Capability Capacity Consumption Availability Resilience

DATA REGULATIONS

LEGACY

WORKLOADS

Telemetry is hiddenCOMPLIANCE

Is my workload running on trusted

infrastructure?

Will my workload get

expected compute cycles?

How do I detect noisy neighbors?

Page 3: Alok Prakash Intel Corporation - 01.org | Intel Open Source … · 2019-06-27 · Alok Prakash Intel Corporation. 2 Challenges in running PET workloads in the Cloud Trust Performance

3

Challenges in Trust Assurance

Can I create a trusted

compute pool?

Can I whitelist node

configurations?

Is there a BIOS root

toolkit?

Is my hypervisor

trusted?

Can I detect when configurations

have been altered?

Linux

Compute Node

VM VMVM VM

KVM

AppAppApp

Services

VM

Attest trust status of node software before placing VMs

Page 4: Alok Prakash Intel Corporation - 01.org | Intel Open Source … · 2019-06-27 · Alok Prakash Intel Corporation. 2 Challenges in running PET workloads in the Cloud Trust Performance

4

Challenges in Performance AssuranceHow big a

VM do I need for

my workload?

What resources should be

reserved, how much burst capacity?

What metric do I use to specify and

measure performance?

What is the performance

capacity of the node?

What is the performance

capacity of the machine flavor?

Am I getting specified performance without

interference from ‘noisy neighbors’?

Define Normalized Compute Unit as a performance metric

Linux

Compute node

VM VMVM VM

KVM

AppAppApp

ServicesServices

VM

Page 5: Alok Prakash Intel Corporation - 01.org | Intel Open Source … · 2019-06-27 · Alok Prakash Intel Corporation. 2 Challenges in running PET workloads in the Cloud Trust Performance

Monitoring Reporting

Resource Director

Cache MonitoringCache Allocation

Platform Resource Metering

5

Noisy Neighbor’s impact on VM performance

Shared

L3 C

ache

Core

Core

Core

Core

Core

Core

Core

Core

Shared Infrastructure

Shared

L3 C

ache

Core

Core

Core

Core

Core

Core

Core

Core

Hypervisor

OS Application & Services

VMVM VMVM

Page 6: Alok Prakash Intel Corporation - 01.org | Intel Open Source … · 2019-06-27 · Alok Prakash Intel Corporation. 2 Challenges in running PET workloads in the Cloud Trust Performance

Remediation

Service assurance for platform trust

Intel® Trusted Execution Technology (Intel® TXT)

Boot Attestation Whitelisting

6

Trust Assurance with Intel® Trusted Execution Technology

Page 7: Alok Prakash Intel Corporation - 01.org | Intel Open Source … · 2019-06-27 · Alok Prakash Intel Corporation. 2 Challenges in running PET workloads in the Cloud Trust Performance

7

Service Assurance Technology Overview

Nova Scheduler

Plug-In

Compute Node

Machine Flavor

Creator

Analysis & Remediation

Engine

Service AssuranceController

Monitoring Engine

Capacity Insight

REST API

Web Admin

Console

Compute NodeCompute Node Agent

SDI Challenge Unpredictable performance in multi-tenant cloud Unknown security status of compute nodes Grey machines (unhealthy state – power/thermal/fan)

are not avoided by Nova compute scheduler Workload scheduling is sub-optimal, needs remediation

The Solution

• Intelligent Workload Placement based on ability of

nodes to meet service level objectives

• Compute node capacity and utilization metrics

normalized for specifying performance requirements

• VM performance monitoring and assurance with cache

contention and “Noisy Neighbor” detection, pinning to

CPU cores to prevent performance issues

• Reporting of trust of VMs and nodes – BIOS, Hypervisor

whitelisting and attestation

Service Assurance Technologies are be made available as blueprints in OpenStack

Page 8: Alok Prakash Intel Corporation - 01.org | Intel Open Source … · 2019-06-27 · Alok Prakash Intel Corporation. 2 Challenges in running PET workloads in the Cloud Trust Performance

8

Extending flavors with VM bursting and assurance capabilities

8

VM will only be created on nodes that

have been trust-attested to be safe

during boot

VM will reserve 1 SCU of compute

performance, and burst up to 2 SCU based on

available compute resources

Page 9: Alok Prakash Intel Corporation - 01.org | Intel Open Source … · 2019-06-27 · Alok Prakash Intel Corporation. 2 Challenges in running PET workloads in the Cloud Trust Performance

Extending flavors with trust and core-pinning capabilities

VM will reserve one dedicated core from the CPU compute

resource

VM will only be created on nodes that have been trust-attested to be

safe during boot

Page 10: Alok Prakash Intel Corporation - 01.org | Intel Open Source … · 2019-06-27 · Alok Prakash Intel Corporation. 2 Challenges in running PET workloads in the Cloud Trust Performance

10

How much performance capacity are available on my host?

10

For each compute node, performance capacity status for OS and VMs are monitored

continuously, displaying how much resource is available, e.g. SCU and number of cores available

Page 11: Alok Prakash Intel Corporation - 01.org | Intel Open Source … · 2019-06-27 · Alok Prakash Intel Corporation. 2 Challenges in running PET workloads in the Cloud Trust Performance

Finding out VM core-pinning status and metrics

Core Pinning, VM received dedicated core from the CPU

VM SLO status include number of cores assigned and VM uptime

Page 12: Alok Prakash Intel Corporation - 01.org | Intel Open Source … · 2019-06-27 · Alok Prakash Intel Corporation. 2 Challenges in running PET workloads in the Cloud Trust Performance

12

Detect resource consumption levels for each compute node

Allocating SCU for OS level consumption of compute resource

Page 13: Alok Prakash Intel Corporation - 01.org | Intel Open Source … · 2019-06-27 · Alok Prakash Intel Corporation. 2 Challenges in running PET workloads in the Cloud Trust Performance

13

How much cache contention is there in the compute nodes?

13

Using Cache Monitoring Technology, a management console displays LLC Cache Contention status for the OS and VM on

each compute node

Page 14: Alok Prakash Intel Corporation - 01.org | Intel Open Source … · 2019-06-27 · Alok Prakash Intel Corporation. 2 Challenges in running PET workloads in the Cloud Trust Performance

Extending flavors with trust and core-pinning capabilities

VM will reserve one dedicated core from the CPU compute

resource

VM will only be created on nodes that have been trust-attested to be

safe during boot

Page 15: Alok Prakash Intel Corporation - 01.org | Intel Open Source … · 2019-06-27 · Alok Prakash Intel Corporation. 2 Challenges in running PET workloads in the Cloud Trust Performance

Finding out VM core-pinning status and metrics

Core Pinning, VM received dedicated core from the CPU

VM SLO status include number of cores assigned and VM uptime

Page 16: Alok Prakash Intel Corporation - 01.org | Intel Open Source … · 2019-06-27 · Alok Prakash Intel Corporation. 2 Challenges in running PET workloads in the Cloud Trust Performance

Learning more about each VM SCU utilization metrics

Real Time VM SCU utilization and core pinning metrics

Page 17: Alok Prakash Intel Corporation - 01.org | Intel Open Source … · 2019-06-27 · Alok Prakash Intel Corporation. 2 Challenges in running PET workloads in the Cloud Trust Performance

Which compute node is Trust-enabled?

Node is capable of running trust assured VMs

IT Policy compliance reports –indicating VM always run on a trust

attested node

Page 18: Alok Prakash Intel Corporation - 01.org | Intel Open Source … · 2019-06-27 · Alok Prakash Intel Corporation. 2 Challenges in running PET workloads in the Cloud Trust Performance

18

Conducting probable cause analysis of “Noisy Neighbors” problems

Take remedial action (e.g. evacuate a VM to a different node) once a node has been identified as having Noisy Neighbors (VMs aggressively using shared compute resources in the platform, like CPU cache) and VMs that are affected

Page 19: Alok Prakash Intel Corporation - 01.org | Intel Open Source … · 2019-06-27 · Alok Prakash Intel Corporation. 2 Challenges in running PET workloads in the Cloud Trust Performance

19

Intel Contributions to OpenStack

Cost Reduction & Efficiency

High Availability

Trust & Compliance

Available Until Kilo (Apr’15)User Needs

Performance & Capacity

Intelligent scheduling & code stability; Higher security & performance

• Service level assurance through CPU usage monitoring

• Neutron module stabilization

• Manual VM migration (host evacuation)

• Trusted Compute Pools (TCP) through Intel® TXT and Open Attestation SW

• Support for virtual firewall applications

• Role based access control improvements

• Enhanced platform awareness (eg. AES-NI, Intel® AVX) for intelligent scheduling

• Improved application I/O performance (SR-IOV)

• Pinning of certain NFV workloads to physical CPU cores and I/O devices

• Storage policies for Swift object storage

• Tagging of images in Glance so they can be assigned the appropriate resources

• Erasure codes in Swift to reduce storage cost

Deployability & Stability

• Versioned Objects support in the Oslo module (for future rolling upgrades)

Page 20: Alok Prakash Intel Corporation - 01.org | Intel Open Source … · 2019-06-27 · Alok Prakash Intel Corporation. 2 Challenges in running PET workloads in the Cloud Trust Performance

o Help us enhance OpenStack* for PET workloads

o Normalized Compute Unithttps://blueprints.launchpad.net/nova/+spec/normalized-compute-units

o Platform Health Aware Schedulinghttps://blueprints.launchpad.net/nova/+spec/platform-health-aware-scheduling

o Coming soon - Platform Contention Aware Scheduling

20

Next Steps

Page 21: Alok Prakash Intel Corporation - 01.org | Intel Open Source … · 2019-06-27 · Alok Prakash Intel Corporation. 2 Challenges in running PET workloads in the Cloud Trust Performance

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at [intel.com].

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

Include as footnote in appropriate slide where performance data is shown:o § Configurations: [describe config + what test used + who did testing] o § For more information go to http://www.intel.com/performance.

Intel, the Intel logo, {List the Intel trademarks in your document} are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

© 2015 Intel Corporation.

21

Disclaimer

Page 22: Alok Prakash Intel Corporation - 01.org | Intel Open Source … · 2019-06-27 · Alok Prakash Intel Corporation. 2 Challenges in running PET workloads in the Cloud Trust Performance

22

Thank You!