network performance monitoring: adding visibility back into your converged infrastructure
TRANSCRIPT
Proprietary & Confidential
Adding Visibility back into your Converged Infrastructure
Amit SinghSr. Technical Marketing EngineerPluribus [email protected]
Matt BushellSr. Director of Product Marketing Pluribus [email protected]
Proprietary & Confidential
Agenda
Market, Definition
Visibility vis a vis Converged
Use Cases
2
Proprietary & Confidential3
Source: Networking Matters in a Hyperconverged World – Joe SkorupaGartner Data Center, Infrastructure & Operations Management Conference, Dec. 5-8, 2016
Proprietary & Confidential
Source: Networking Matters in a Hyperconverged World – Joe SkorupaGartner Data Center, Infrastructure & Operations Management Conference, Dec. 5-8, 2016
4
Proprietary & Confidential
So What’s Missing in Converged Infrastructure?
5
StorageCompute
Scaled Compute
Converged Infrastructure
Network
DAS-based Storage
The Network!
Visibility!
Proprietary & Confidential
Pluribus Can Add Visibility to all Converged Infrastructure Use Case Workloads
VDI
Data Protection &
Disaster RecoveryBig Data
Enterprise
Applications
Collaboration
and UC
Private &
Hybrid Clouds
Branch
Office
Proprietary & Confidential
What’s One Thing Great Network Visibility is Great At?Troubleshooting!
7
MTTI
Mean time to
identify a problem
MTTK
Mean time to knowledge
Most of the time is spent triaging
an incident across IT groups
MTTF
Mean time
to fix
MTTV
Mean time
to verify
My App is
Slow!!
“Mean Time To Repair” (MTTR)
Proprietary & Confidential
Converged Infra+Pluribus Tools For Rapid Triaging
8
My App is
Slow!!
Must be
The network…
Application
Admin
Network
AdminConverged Infra
Admin
Proprietary & Confidential
Pluribus VCFcenter™ Reduces MTTK!
9
Converged Services-aware
Application-aware
Integrated in the network…
no monitoring infrastructure overhead
Every East-West connection…No
sampling
Always ON =
simple to use
ZERO Config =
simple to deploy
Proprietary & Confidential
Pluribus VCF Analytics – Nutanix Ready Certified
Fabric - Standard, interoperable,
scalable, non-blocking fabric based
on open networking switches
Controller-less - Centralized
provisioning, automation and visibility
of multiple switches (no external
controllers/new protocols)
Tested - Highly-available, robust
L2/L3/VXLAN control plane (Tolly
Tested)
Proven - Deployed at scale in mission
critical Enterprise applications
10
IP Network
3rd Party Spine Switch
Proprietary & Confidential11
VCF-IA for Nutanix Use Cases Summary
1. Infrastructure baselining
2. Controller VM restart tracing and analysis
3. Virtual machine mobility tracing and analysis
4. Re-allocation of a cluster function (Prism)
Proprietary & Confidential12
VCF-IA 1.5 Nutanix Tagging Definition, Example
Lo
w P
rio
rity
H
igh
Security
Prism Management
White List
Acropolis Hypervisor moving a virtual machine
Controller VM participating cluster configuration
Acropolis Hypervisor all other actions
Controller VM updates another CVM
Direct access to Prims on a Controller VM
Controller VM all other actions
End user VMs, application servers VMs,…
Catch all on
cluster VIP
Client Server
(src) (dst)
Proprietary & Confidential13
VCF-IA 1.5 Nutanix Dashboard with Tagging
Proprietary & Confidential
1. Baselining – Know the environment 1/3
14
Use VCF-IA to identify the typical usage pattern:
1. Identify the Nutanix cluster traffic by selecting applications such as Nutanix-Zeus (cluster configuration), the
physical nodes IP, the Controller VMs IP, the cluster management virtual IP.
2. An unusual level of connections in SYN and/or RST state can signal a failure condition
Excessive RST/SYN
VCF-IA Dashboard
Proprietary & Confidential
1. Baselining – Using “project_tags” to identify the CVMs
15
Use VCF-IA tag Nutanix_Device :
1. Identify the traffic originated by the Nutanix Controller Virtual Machines by selecting “CVM” in the quick search
box
2. An unusual level of connections in SYN and/or RST state can signal an anomaly
Excessive RST/SYN
VCF-IA Dashboard
CVM
Proprietary & Confidential16
Nodes activity
VCF-IA Dashboard
Use VCF-IA tag Nutanix_Device:
1. Identify the traffic originated by the Nutanix Controller Virtual Machines by selecting “Node” in the quick search
box
2. Nodes activity can be correlated with cluster operations
Nodes
1. Baselining – Using “project_tags” to identify the Nodes
Proprietary & Confidential
2. Troubleshooting – Observe the event
17
CVM-2
CVM-3
CVM-1
Zeus, cluster
configuration
SYN: failed attempts to communicate
1. Select the connection in SYN state to filter the failed communication attempts.
2. Correlate the timing of reported performance problems with the SYN traffic initiated from CVMs (clients).
Problem time
Proprietary & Confidential
2. Troubleshooting – Analyze the Event
18
Zeus/Zookeeper
cluster
configuration
Exclude servers and applications not
relevant for the analysis
CVM-2 CVM-3
CVM-1
is
unresponsive
Are you alive?!
SYN: no answer for 15
minutes!
1. Apply filters to narrow the analysis to specific connections relevant for the Nutanix cluster
2. In case of a brief CVM failure, the event could pass unnoticed except for some limited performance hit
3. VCF-IA brings a distinct recording of the event
Proprietary & Confidential
2. Troubleshooting – Back to Normal
19
1. After a failure, when the CVM is back on line, it establishes connections to other CVMs to sync the metadata
2. VCF-IA records the CVM coming on line, when Cassandra reestablishes and keep open (EST) the connections
3. Important question: how many times this happened in the past? When?
CVM-1
CVM-1
CVM-2
CVM-3
CVM-2
CVM-3
Cassandra
metadata
service
Newly
established
connections
Restored connections:
CVM-1 -> CVM-2
CVM-1 -> CVM-3
CVM-2 -> CVM-1
CVM-3 -> CVM-1
Back to normal
All nodes communicate via Cassandra
Proprietary & Confidential
3. Scale – Monitor the Workload Shifting East-West
20
1. The origin node provides the destination node the context of a moving VM
2. The communication is established only for the time needed to transfer the context, then it is terminated (FIN)
Connections
are closed
after the
move
Acropolis
hypervisor to
hypervisor
Move from
node 3 to
node 1
Move from
node 1 to
node 3
Move from
node 3 to
node 1 and
back
Proprietary & Confidential21
4. Application – Which node owns Prism?1. The Nutanix cluster management GUI connects to the cluster virtual IP
2. VCF-IA allows to track over time the CVM that owns the cluster virtual IP and to easily detect unusual changes
Node 2 failure:
virtual IP moves from
node 2 to node 1, from
physical port 7 to port 8 …node 2 failure…..… VIP moves to node 1
Prism service was restored after about 5 minutes
Prism
Client
Prism
VIP
Prism
Application
Nutanix Prism
VCF-IA Search
Proprietary & Confidential
Thank You, Questions?
22