network performance monitoring: adding visibility back into your converged infrastructure

22
Proprietary & Confidential Adding Visibility back into your Converged Infrastructure Amit Singh Sr. Technical Marketing Engineer Pluribus Networks [email protected] Matt Bushell Sr. Director of Product Marketing Pluribus Networks [email protected]

Upload: pluribus-networks

Post on 11-Apr-2017

65 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Network Performance Monitoring: Adding Visibility back into your Converged Infrastructure

Proprietary & Confidential

Adding Visibility back into your Converged Infrastructure

Amit SinghSr. Technical Marketing EngineerPluribus [email protected]

Matt BushellSr. Director of Product Marketing Pluribus [email protected]

Page 2: Network Performance Monitoring: Adding Visibility back into your Converged Infrastructure

Proprietary & Confidential

Agenda

Market, Definition

Visibility vis a vis Converged

Use Cases

2

Page 3: Network Performance Monitoring: Adding Visibility back into your Converged Infrastructure

Proprietary & Confidential3

Source: Networking Matters in a Hyperconverged World – Joe SkorupaGartner Data Center, Infrastructure & Operations Management Conference, Dec. 5-8, 2016

Page 4: Network Performance Monitoring: Adding Visibility back into your Converged Infrastructure

Proprietary & Confidential

Source: Networking Matters in a Hyperconverged World – Joe SkorupaGartner Data Center, Infrastructure & Operations Management Conference, Dec. 5-8, 2016

4

Page 5: Network Performance Monitoring: Adding Visibility back into your Converged Infrastructure

Proprietary & Confidential

So What’s Missing in Converged Infrastructure?

5

StorageCompute

Scaled Compute

Converged Infrastructure

Network

DAS-based Storage

The Network!

Visibility!

Page 6: Network Performance Monitoring: Adding Visibility back into your Converged Infrastructure

Proprietary & Confidential

Pluribus Can Add Visibility to all Converged Infrastructure Use Case Workloads

VDI

Data Protection &

Disaster RecoveryBig Data

Enterprise

Applications

Collaboration

and UC

Private &

Hybrid Clouds

Branch

Office

Page 7: Network Performance Monitoring: Adding Visibility back into your Converged Infrastructure

Proprietary & Confidential

What’s One Thing Great Network Visibility is Great At?Troubleshooting!

7

MTTI

Mean time to

identify a problem

MTTK

Mean time to knowledge

Most of the time is spent triaging

an incident across IT groups

MTTF

Mean time

to fix

MTTV

Mean time

to verify

My App is

Slow!!

“Mean Time To Repair” (MTTR)

Page 8: Network Performance Monitoring: Adding Visibility back into your Converged Infrastructure

Proprietary & Confidential

Converged Infra+Pluribus Tools For Rapid Triaging

8

My App is

Slow!!

Must be

The network…

Application

Admin

Network

AdminConverged Infra

Admin

Page 9: Network Performance Monitoring: Adding Visibility back into your Converged Infrastructure

Proprietary & Confidential

Pluribus VCFcenter™ Reduces MTTK!

9

Converged Services-aware

Application-aware

Integrated in the network…

no monitoring infrastructure overhead

Every East-West connection…No

sampling

Always ON =

simple to use

ZERO Config =

simple to deploy

Page 10: Network Performance Monitoring: Adding Visibility back into your Converged Infrastructure

Proprietary & Confidential

Pluribus VCF Analytics – Nutanix Ready Certified

Fabric - Standard, interoperable,

scalable, non-blocking fabric based

on open networking switches

Controller-less - Centralized

provisioning, automation and visibility

of multiple switches (no external

controllers/new protocols)

Tested - Highly-available, robust

L2/L3/VXLAN control plane (Tolly

Tested)

Proven - Deployed at scale in mission

critical Enterprise applications

10

IP Network

3rd Party Spine Switch

Page 11: Network Performance Monitoring: Adding Visibility back into your Converged Infrastructure

Proprietary & Confidential11

VCF-IA for Nutanix Use Cases Summary

1. Infrastructure baselining

2. Controller VM restart tracing and analysis

3. Virtual machine mobility tracing and analysis

4. Re-allocation of a cluster function (Prism)

Page 12: Network Performance Monitoring: Adding Visibility back into your Converged Infrastructure

Proprietary & Confidential12

VCF-IA 1.5 Nutanix Tagging Definition, Example

Lo

w P

rio

rity

H

igh

Security

Prism Management

White List

Acropolis Hypervisor moving a virtual machine

Controller VM participating cluster configuration

Acropolis Hypervisor all other actions

Controller VM updates another CVM

Direct access to Prims on a Controller VM

Controller VM all other actions

End user VMs, application servers VMs,…

Catch all on

cluster VIP

Client Server

(src) (dst)

Page 13: Network Performance Monitoring: Adding Visibility back into your Converged Infrastructure

Proprietary & Confidential13

VCF-IA 1.5 Nutanix Dashboard with Tagging

Page 14: Network Performance Monitoring: Adding Visibility back into your Converged Infrastructure

Proprietary & Confidential

1. Baselining – Know the environment 1/3

14

Use VCF-IA to identify the typical usage pattern:

1. Identify the Nutanix cluster traffic by selecting applications such as Nutanix-Zeus (cluster configuration), the

physical nodes IP, the Controller VMs IP, the cluster management virtual IP.

2. An unusual level of connections in SYN and/or RST state can signal a failure condition

Excessive RST/SYN

VCF-IA Dashboard

Page 15: Network Performance Monitoring: Adding Visibility back into your Converged Infrastructure

Proprietary & Confidential

1. Baselining – Using “project_tags” to identify the CVMs

15

Use VCF-IA tag Nutanix_Device :

1. Identify the traffic originated by the Nutanix Controller Virtual Machines by selecting “CVM” in the quick search

box

2. An unusual level of connections in SYN and/or RST state can signal an anomaly

Excessive RST/SYN

VCF-IA Dashboard

CVM

Page 16: Network Performance Monitoring: Adding Visibility back into your Converged Infrastructure

Proprietary & Confidential16

Nodes activity

VCF-IA Dashboard

Use VCF-IA tag Nutanix_Device:

1. Identify the traffic originated by the Nutanix Controller Virtual Machines by selecting “Node” in the quick search

box

2. Nodes activity can be correlated with cluster operations

Nodes

1. Baselining – Using “project_tags” to identify the Nodes

Page 17: Network Performance Monitoring: Adding Visibility back into your Converged Infrastructure

Proprietary & Confidential

2. Troubleshooting – Observe the event

17

CVM-2

CVM-3

CVM-1

Zeus, cluster

configuration

SYN: failed attempts to communicate

1. Select the connection in SYN state to filter the failed communication attempts.

2. Correlate the timing of reported performance problems with the SYN traffic initiated from CVMs (clients).

Problem time

Page 18: Network Performance Monitoring: Adding Visibility back into your Converged Infrastructure

Proprietary & Confidential

2. Troubleshooting – Analyze the Event

18

Zeus/Zookeeper

cluster

configuration

Exclude servers and applications not

relevant for the analysis

CVM-2 CVM-3

CVM-1

is

unresponsive

Are you alive?!

SYN: no answer for 15

minutes!

1. Apply filters to narrow the analysis to specific connections relevant for the Nutanix cluster

2. In case of a brief CVM failure, the event could pass unnoticed except for some limited performance hit

3. VCF-IA brings a distinct recording of the event

Page 19: Network Performance Monitoring: Adding Visibility back into your Converged Infrastructure

Proprietary & Confidential

2. Troubleshooting – Back to Normal

19

1. After a failure, when the CVM is back on line, it establishes connections to other CVMs to sync the metadata

2. VCF-IA records the CVM coming on line, when Cassandra reestablishes and keep open (EST) the connections

3. Important question: how many times this happened in the past? When?

CVM-1

CVM-1

CVM-2

CVM-3

CVM-2

CVM-3

Cassandra

metadata

service

Newly

established

connections

Restored connections:

CVM-1 -> CVM-2

CVM-1 -> CVM-3

CVM-2 -> CVM-1

CVM-3 -> CVM-1

Back to normal

All nodes communicate via Cassandra

Page 20: Network Performance Monitoring: Adding Visibility back into your Converged Infrastructure

Proprietary & Confidential

3. Scale – Monitor the Workload Shifting East-West

20

1. The origin node provides the destination node the context of a moving VM

2. The communication is established only for the time needed to transfer the context, then it is terminated (FIN)

Connections

are closed

after the

move

Acropolis

hypervisor to

hypervisor

Move from

node 3 to

node 1

Move from

node 1 to

node 3

Move from

node 3 to

node 1 and

back

Page 21: Network Performance Monitoring: Adding Visibility back into your Converged Infrastructure

Proprietary & Confidential21

4. Application – Which node owns Prism?1. The Nutanix cluster management GUI connects to the cluster virtual IP

2. VCF-IA allows to track over time the CVM that owns the cluster virtual IP and to easily detect unusual changes

Node 2 failure:

virtual IP moves from

node 2 to node 1, from

physical port 7 to port 8 …node 2 failure…..… VIP moves to node 1

Prism service was restored after about 5 minutes

Prism

Client

Prism

VIP

Prism

Application

Nutanix Prism

VCF-IA Search