aci troubleshooting tools and - alcatron.net live 2015 melbourne/cisco live... · #clmel aci...

Post on 10-Aug-2018

273 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

#clmel

ACI Troubleshooting Tools andBest Practices

BRKACI-3001

Gerard Chami – Datacenter Solutions Engineer

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Agenda

• Setting the stage for troubleshooting

• Troubleshooting Tools

• Best Practices

– Things to do

– Things to watch out for

• Conclusion

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Overview of ACI Fabric Policy MechanismsWorking with ACI

GUI

CLI

Web

API

Tools

Object

Browser

Python

SDK

Admin

REST

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Overview of ACI Fabric Policy MechanismsLogical Model

fvTenant

fvAp

fvAEPg

fvRsBd

fvAEPg

fvRsBd

fvBD

fvRsCtx

fvCtx

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Overview of ACI Fabric Policy MechanismsResolved Model

fvTenant

fvAp

fvAEPg

fvRsBd

fvAEPg

fvRsBd

fvBD

fvRsCtx

fvCtx

Policy Element

Policy Manager

fvEpPCont

fvEpP

fvLocale

fvEpP

fvLocale

fvStPathAtt

fvIfConn

fvDyPathAtt

fvIfConn

fvEpPCont

fvEpP

fvLocale

fvEpP

fvLocale

fvStPathAtt

fvIfConn

fvDyPathAtt

fvIfConn

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Overview of ACI Fabric Policy MechanismsConcrete Model

Policy Element

sys

l3Ctx

l2BD l2BD

vlanCktEp

l2RsPathDomAtt

vxlanCktEp

l3Ctx

l2BD

fvEpPCont

fvEpP

fvLocale

fvEpP

fvLocale

fvStPathAtt

fvIfConn

fvDyPathAtt

fvIfConn

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Overview of ACI Fabric Policy MechanismsHardware Programing – Forwarding Plane

iNxos

vlan

vxlan

vrf

BGPospf

isis

vrf overlay-1

interfaces

sys

l3Ctx

l2BD l2BD

vlanCktEp

l2RsPathDomAtt

vxlanCktEp

l3Ctx

l2BD

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Setting The Stage For Troubleshooting

Check health scores to narrow down affected scope

Check for faults in the system. If anything fails deployment then faults are raised

Check the resolved object model is present on both APIC and relevant Leafs

Check the concrete objects are present on the relevant Leafs

Verify iNXOS using the iNXOS shell commands

Troubleshooting Checklist

Troubleshooting Tools

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Graphic User Interface (GUI)GUI Tools

FaultsHealth Audits Events

Statistics Call-home Syslogs SNMP

11

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

APIC Command Line (iShell)APIC SSH Access

admin@fab1_apic1:~> lsaci debug mit

admin@fab1_apic1:~> cd aciadmin@fab1_apic1:aci> lsadmin fabric l4-l7-services system tenants vm-networking

admin@fab1_apic1:~> cd debugadmin@fab1_apic1:debug> lsapic1 apic2 apic3 leaf1 leaf2 spine1 spine2

admin@fab1_apic1:~> cd mitadmin@fab1_apic1:mit> lscomp dbgs expcont fwrepo topology uni

OR

SSH to the APIC

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

CLI Available at the Switch

13

leaf101# lsaci bootflash data dev isan lib mit proc sys usb var bin controller debug etc lc logflash mnt sbin tmp usr volatile

leaf101# vshCisco NX-OS Software

Enter NXOS shell

leaf101# vsh_lcmodule-1#

Enter NXOS hardware internals

CLI Shell

leaf101# bcm-shell-hwbcm-shell.0>

Entering the broadcom shell

It is also possible to execute VSH/bcm-shell-hw commands directly from iShell, using the syntaxvsh -c “<command>” vsh_lc –c “<command>” bcm-shell-hw “<command>”

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Visore – It’s Italian for Viewer!

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Moquery - Command Line Cousin to Visore!

admin@apic1:~> moquery -d uni/tn-gchami-tn

Total Objects shown: 1

# fv.Tenant

name : gchami-tn

childAction :

descr :

dn : uni/tn-gchami-tn

lcOwn : local

modTs : 2015-02-04T23:27:05.622+00:00

monPolDn : uni/tn-common/monepg-default

ownerKey :

ownerTag :

rn : tn-gchami-tn

status :

uid : 15374

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

ScriptsCobra SDK ACI toolkit

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

API Inspector – Built into the GUI

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Rest API - Postman

Tools in Action

Health, Faults, Events, Audits & Objects

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Health Score Degraded - Identification

21

Navigating to the System Health Dashboard will identify the switch that has a

diminished health score

• Double clicking on that leaf will allow

navigation into the

faults raised on that

device. Here we click

on rtp_leaf1

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Drilling Down

22

Double Click on Degraded Health Score or Highlight the Health Tab

Health Score

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Getting to Object Fault

Interface 1/35 on this

Leaf having issues

Interface has a fault due to

being used by an EPG

however interface is

missing an SFP transceiver

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Faults in ACI A fault is a Managed Object (MO)

contained in M.I.T.

It is a child of the affected MO

It has the following properties:

code

severity

lifecycle

description

timestamps

Faults RN is “fault-<code>”, for example, fault-F123

Can be queried by DN and class (fault:Inst)

chassis-1

card-1 card-2

port-1

fault-F123

fault -F456

fault -F789

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Timer and severity values can be customized using monitoring policies

Fault Lifecycle

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Look for the “faults” tab on the right

Faults in GUI

Keep an eye out for faults indications

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Faults Using MoqueryGetting all faults in txt to analyze later:

leaf1# moquery -c faultInst > /tmp/fault-20141112.txtleaf1# ls -l /tmp/fault-20141112.txt-rw------- 1 admin admin 40113 Nov 13 13:37 /tmp/fault-20141112.txt

Want to get all configuration failed fault ?

leaf1# moquery -c faultInst -f 'fault.Inst.code == "F0467"' | egrep "cause|dn"cause : configuration-faileddn : uni/epp/fv-[uni/tn-testTenant2/ap-testAP/epg-testEPG]/nwissues/fault-F0467

Want that in json?leaf1# moquery -c faultInst -o json

{"imdata": [

{"faultInst": {

"attributes": {"dn": "sys/phys-[eth1/11]/fault-F1186","domain": "infra","code": "F1186","occur": "1","subject": "failure-to-deploy","severity": "warning","descr": "Port configuration failure.

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Events in GUI

• Much like other navigation / HISTORY / EVENTS

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Accounting - Audit Log

• A mechanism to track user-initiated configuration changes

• When a user creates/modifies/deletes an MO, we create an “audit record”containing affected MO DN, user name, timestamp and change details

• System also creates logs for log-in/log-out to controllers and nodes

• Similar to an entry in a log file: once created, they are never modified

• Configuration change logs are MOs of class aaaModLR

• Login/logout logs are MOs of class aaaSessionLR

• Accounting logs get deleted only when a maximum number specified in a retention policy is hit

29

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Audit Log

Who created that ?

30

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Use Moquery or Visore to Check the Model

fvTenantfvApfvAEPgfvRsBdfvBdfvRsCtxfvCtxfvSubnet…

fvCtxDeffvBDDegfvEpPfvlocalefvStPathAttfvDyPathAttfvIfconn

fvCtxDeffvBDDegfvEpPfvlocalefvStPathAttfvDyPathAttfvIfconn

l3Ctxl2BDvlanCktEpvxlanCktEpl2RtDomIfConnvlanRsPathDomAttvlanRsVlanEppAttvxlanRsVxlanEppAtt

Show vlans Show system internal epm …

Vsh_lc:

Show system internal eltmc info…Show system internal epmc info

For YourReference

Datapath Troubleshooting Tools

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

iPing CLI

iping [options] <target ip address>

options:

-V vrf name (tenant:context)-c count-i wait-p pattern-s packet size-t timeout-S source ip address or source interface

33

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Spine1

Leaf2Leaf1

EP1

Tenant: gchami

Context: vrf01

Subnet: 100.0.1.254/24

100.0.2.254/24

Iping –V gchami:vrf01 –S 100.0.1.254 100.0.1.1

100.0.1.1

TEP:10.0.96.95 TEP:10.0.96.92

iping from leaf1

iping from leaf2

snoop

• Recommend to set the source ipaddress to make clear which

gateway address is used• ICMP echo reply packet to the

remote leaf node is relayed by the

local leaf node

iPing Internal

34

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

iTraceroute CLI

Node traceroute:itraceroute <dst-ip> [<pld-size>]

Tenant traceroute:

For vlan encapsulated source EPitraceroute <dst-ip> vrf <vrf-name> [ encap vlan [<vlan-encap>] ] [ payload <pld-size> ]

For VxLAN encapsulated source EPitraceroute <dst-ip> vrf <vrf-name> encap vxlan [<vxlan-encap>] dst-mac <dst-mac> [ { payload <pld- size> } ]

35

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Tenant iTraceroute - Example

pod2-leaf1# itraceroute 10.11.1.11 vrf RDTenant traceroute to 10.11.1.11, tenant VRF RD, source encap vlan-2101, from [10.0.40.66], payload 56 bytes

Path 11: TEP 10.0.64.65 intf eth1/33 0.746 ms2: TEP 10.0.40.65 intf eth1/97 0.490 ms

Path 21: TEP 10.0.64.64 intf eth1/33 0.812 ms2: TEP 10.0.40.65 intf eth1/98 0.526 ms

37

Spine1

Pod2-Leaf4Pod2-Leaf1

TEP: 10.0.40.66 TEP:10.0.96.92

Spine2

TEP: 10.0.64.64 TEP: 10.0.64.65

10.11.1.11

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Node iTraceroute - Example

pod2-leaf1# itraceroute 10.0.40.95

Node traceroute to 10.0.40.95, infra VRF overlay-1, from [10.0.40.66], payload 56 bytes

Path 1

1: TEP 10.0.64.64 intf eth1/33 0.611 ms

2: TEP 10.0.40.95 intf eth1/98 0.608 ms

Path 2

1: TEP 10.0.64.65 intf eth1/33 0.473 ms

2: TEP 10.0.40.95 intf eth1/97 0.540 ms

38

Spine1

Pod2-Leaf4Pod2-Leaf1

TEP: 10.0.40.66 TEP:10.0.96.92

Spine2

TEP: 10.0.64.64 TEP: 10.0.64.65

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

ACI Span

Infrastructure Span:• Meant for traffic to/from access ports.

• Infra SPAN supports both local and to remote (ERSPAN) destinations.

• Infra SPAN can also be filtered by an EPG.

• Configured in Fabric access policies Troubleshoot policies Span (source and dest)

Fabric Span:• Meant for traffic to/from fabric ports (Leaves and Spines).

• Fabric SPAN supports remote destinations only, and can be filtered with a Bridge-domain or Network Context.

• Configured in Fabric Fabric policies Troubleshoot policies span

Tenant Span:• Meant for traffic to/from EPGs and supports only remote destination.

• Configured in The concerned Tenant Troubleshoot policies Span (remote always)

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Atomic Counters

• Troubleshooting tool to count packets and bytes between a source and a destination

• Only packets that traverse the fabric are counted

• Locally switched packets are not counted

• Packets switched in the hypervisors are not counted

• There are two types of counters: “ongoing” and “on demand” counters

• NTP must be properly setup and operational on each nodes and APIC (check on node with show ntp peer-status)

40

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Ongoing Atomic Counters

• Ongoing Atomic Counters are not user-configurable

• They count packets at the infrastructure level: the source and destination of the flow are Tunnel End Points (TEPs)

• Example: all packets sent from L1 to L3

• Paths are unidirectional– L1-to-L3 ≠ L3-to-L1

L1 L2 L3 L4

S1 S2

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

On Demand Atomic Counters

• On Demand counters are configured by the tenant to troubleshoot issues at the level of individual applications

• The source and destination can be EPs, EPGs, IP addresses or “Any”

• For example, packets from

EP1 to EPg2

EP1 EP2 EP3EPG2

S1 S2

Gathering Logs

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Tech Support Features

• One interface to collect tech-support from any subset of fabric components and features

• Save to fabric, or export to remote server

• On-demand or periodic

• Configurable data collection

• Downloadable via http from the fabric

• Tech-Support are HUGE !!! (multi gig of tar data)

• They mostly contain logs useful for development. For Postmortem of an event recommended to get Tech-support of APIC’s and impacted leaves ASAP as some logs rollover quickly.

44

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Create Tech Support Policy

In Admin Import/Export export policies Techsupport …

45

For YourReference

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

ACI Core Files

APIC Cluster Troubleshooting

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

admin@ifav1-ifc1:~> acidiag fnvread

ID Name Serial Number IP Address Role State LastUpdMsgId

---------------------------------------------------------------------------------------------------------------------------------------------

1017 ifav1-leaf1 SAL17267Z9S 10.0.63.127/32 leaf active 0

1018 ifav1-leaf2 SAL1739D5WU 10.0.63.125/32 leaf active 0

1200 ifav1-spine1 SAL1748H575 10.0.63.126/32 spine active 0

APIC Debug Commandsacidiag fnvread

Show Fabric node vector

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

APIC Debug Commands

Replica id State 6=UP

Leadership

state

APIC where it

is running

“acidiag rvread” shows replica which are not healthy

“acidiag rvread <svc><shard><replica>” to see the state of one replica

acidiag rvread

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

APIC Debug Commands

Cluster size Chassis ID ActiveSummary of

replica health

acidiag avread

Show APIC controller application vector

ACI Best Practices

Things to do

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public57

Plan Your Naming Wisely!

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Every Object In ACI Must Be Named.

TenantApp

Profile

Bridge

Domain

Private

Network Contract

Filter

Subnet

EPGAttachable

Entity Profile

Filter

Interface

Profile

Switch

Profile

Interface

Selector

VMM

Domain

VLAN PoolPhysical

Domain

L3

Outside

L2

Outside

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

A Good Naming Scheme Is Essential.

VLAN Pools

VL-DVS

VL-AVS

VL-L2-Out

VL-L3-Out

Filters

flt-http

flt-https

flt-sql

Interface Profiles

iprof-ucs

iprof-fex

iprof-ext-switch

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Setup NTP!

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Symmetry is Beautiful!

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Choose Matching Interface Numbers

1/10 1/10

Makes For Simpler Policy!

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Regular Config Exports

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Plan Your Maintenance!

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Upgrade Maintenance Groups

Maintenance group 1 Maintenance group 2

Things To Watch Out For!

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

The ACI Fabric Cannot Currently Be Used as a ‘Transit’ Network.

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

No Transit Routing Through Fabric.

ACI Fabric

Context / VRF

L3 out

In this example, 192.168.1.0/24 will not

be advertised to Router B.

Router A Router B

192.168.1.0/24

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Use a Unique ACI ‘infra’ IP Range when Provisioning the APIC.

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

VTEP addresses are allocated to nodes in the fabric automatically based on the pool configured at fabric initialisation.

Future services may cause overlapping address space issues.Changing the infra IP range is difficult – so choose unique range.

Leaf 110.0.104.95

Leaf 210.01.104.96

Leaf 310.0.104.97

Spine 110.0.104.92

Spine 210.0.104.93

Infra range:

10.0.0.0/16

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Try to Choose a Unique ‘infra’ VLAN within the Fabric.

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

With AVS, the ‘infra’ VLAN gets extended out of the fabric.

Leaf 1 Leaf 2 Leaf 3

Spine 1 Spine 2Infra VLAN:

4093

ESXi

AVS

If the default infra VLAN of 4093 is used – this becomes

an issue if AVS is deployed ‘behind’ a Nexus 7K, etc due to reserved VLAN ranges on that platform.

The infra VLAN should be chosen carefully if this

scenario is required.

Nexus 7K

VLA

N

4093

VLA

N

4093

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Remember, Adjacent OSPFDevices must use be Configured for NSSA!

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

External

Router

External

Router

Leaf Leaf Leaf Leaf

Spine Spine

router ospf 1

vrf Blue

area 0.0.0.1 nssa

vrf Red

area 0.0.0.2 nssa

Q & A

© 2015 Cisco and/or its affi liates. All rights reserved.BRKACI-3001 Cisco Public

Give us your feedback and receive a

Cisco Live 2015 T-Shirt!

Complete your Overall Event Survey and 5 Session

Evaluations.

• Directly from your mobile device on the Cisco Live

Mobile App

• By visiting the Cisco Live Mobile Site

http://showcase.genie-connect.com/clmelbourne2015

• Visit any Cisco Live Internet Station located

throughout the venue

T-Shirts can be collected in the World of Solutions

on Friday 20 March 12:00pm - 2:00pm

Complete Your Online Session Evaluation

Learn online with Cisco Live! Visit us online after the conference for full

access to session videos and

presentations. www.CiscoLiveAPAC.com

Thank you.

top related