welcome technical services virtual boot camp session 8
DESCRIPTION
Welcome Technical Services Virtual Boot Camp Session 8. Technical Services India Team. Technology · Architecture Overview UCS C-series UCS B- series · UCS Interoperability Hardware Software · Troubleshooting Case Study (Lab Demo) Q&A. Cisco Support Community. - PowerPoint PPT PresentationTRANSCRIPT
Cisco Confidential 1© 2010 Cisco and/or its affiliates. All rights reserved.
Welcome
Technical Services Virtual Boot Camp
Session 8
Technical Services India Team
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 2
Recap – Session 7 (18th Feb)
Process
Technology
CiscoSupport
Community
Technology · Architecture Overview UCS C-seriesUCS B-series
· UCS Interoperability Hardware Software
· TroubleshootingCase Study (Lab Demo)
Q&A
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 3
Course Material
https://supportforums.cisco.com/docs/DOC-37994 ...PPT
https://supportforums.cisco.com/videos/7517 ....Video
https://supportforums.cisco.com/docs/DOC-37851 ...Q&A
Process
Technology
CiscoSupport
Community
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 4
Today Agenda (Session -8)
Process
Technology
CiscoSupport
Community
Technology · Firmware Install and upgrade UCS C-series
UCS B-series
· TroubleshootingCase Study (Lab Demo)Important logsPart Identification and RMA
Q&A
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 5
Introduction
Nirmal Sodani Technical Support Manager
Mohit Mmangal Manager, CSC
Avinash Shukla TAC Escalation Engineer
Vinay Sharma Lead, CSC
Teclus D'Souza TAC Escalation Engineer
Chetan Badami Technical Escalation Engineer
Cisco Confidential 6© 2010 Cisco and/or its affiliates. All rights reserved.
Technology – UCS
Avinash Shukla
Teclus D'Souza
Chetan Badami
Process
Technology
CiscoSupport
Community
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 7
Agenda
UCS Upgrade Procedure C-series
B-series
UCS Troubleshooting UCSM / FI / IOM / Blade
C-series
UCS H/W and S/W Interoperability
© 2010 Cisco Systems, Inc. All rights reserved. CAE BootcampPresentation_ID 8
UCS H/W and S/W Interoperability
Avinash ShuklaCisco TAC
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 9
Operating System Check the support matrix before installing the OS on the blade
Install / keep the drivers (Eth / FC) updated as per the matrix
Few important things to check:
–Is the blade running the certified OS and OS version?
–Are there any special needs for that OS? E.g. VMWare – OEM Image
–Are the drivers at the OS level updated and current?
Answer:
–UCS S/W and H/W matrix
–http://www.cisco.com/web/techdoc/ucs/interoperability/matrix/matrix.html
–http://www.cisco.com/en/US/products/ps10477/prod_technical_reference_list.html
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 10
H/W and S/W Interop
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 12
What each matrix provides
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 13
Sample..driver versions
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 15
Agenda
C series firmware upgrade Pre-requisites Firmware ISO location and downloading Upgrade process
B series firmware upgrade Pre-requisites Firmware bundles and downloading Upgrade process Additions / Modifications from version 2.1
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 16
Pre-requisites C Series
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 17
Things to consider
Release Notes will cover gotchas and concerns in the upgrade process
Upgrades from one version back will always work
Check release notes about prior versions–If customer is really far behind it might require two upgrades to get to current code
Schedule an maintenance window–CIMC and server will reboot during upgrade
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 19
C Series Upgrade
Downloading iso file
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 20
Upgrade processC Series
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 21
Map the iso on the KVM
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 22
Boot from Virtual Media
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 23
HUU Screen and options
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 24
HUU Screen and options
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 25
After all component upgrade
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 26
Verify Upgrade To verify check that all components are upgraded
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 27
Pre-requisitesB Series
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 28
Things to consider
Release Notes will cover gotchas and concerns in the upgrade process
Upgrades from one version back will always work
Check release notes about prior versions–If customer is running a very old version, it might require two upgrades to get to current code
Schedule an maintenance window–FI and IOM will reboot during upgrade
–Make sure network and storage fabric are redundant
Highly recommended to backup UCSM configuration
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 29
Be patient
Upgrade process is not quick
Sometimes bugs will result in the first release after FCS
Expect a maintenance release shortly after FCS
Follow the upgrade procedure for each version–The procedure is not always the same from one version to another.
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 30
Downgrading
Sometimes there might be data loss
Might have to erase config to downgrade–Database changes in new versions cannot always be back ported
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 32
Bundles
Prior to 1.4 there was only all inclusive bundle
Now there are multiple bundles–Infra-bundle – contains code for FI, IOM, and UCSM
–B-series bundle – contains BIOS and blade specific code
–C-series bundle – contains BIOS and rack server specific code
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 33
Bundles
All firmware work is done from Equipment tab in UCSM
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 34
Bundles
Packages can be viewed/deleted from “Packages” tab
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 35
Bundles Bundles are downloaded from the “Download Tasks” tab
Downloads can be through desktop or using ftp/scp/sftp/tftp
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 36
Cisco.com to download FCS bundles
B-Series packages
C-Series packages
FI, IOM, and UCSM software
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 37
Pre 1.4 bundles are single download
1.0-1.3 bundles
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 38
Upgrade processB Series
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 39
Upgrade Process
Again always consult release notes
Upgrade through GUI is easiest
General Process is• Backup UCS Config (Full & All Config)
• Download code
• Update components
• Activate components in order of (Check RN cause order can change)• Interface cards – Set Startup Only
• CIMC
• IOM – Set Startup Only
• UCSM
• FI
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 40
Updating Components Update means copy new code to backup location of all
UCSM components
Simply stages the new code
Can update all components at once
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 41
Updating Components Time to update will vary based on component
IOMs take a long time. Up to 5 minutes
If any component has issues check FSM for that component
Update process does not work on FI
Once everything is in “Ready” state you can move to Activate
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 42
Activate Components In this process you activate the code that you copied
Some code is activated but set to “activate on next reboot”
Understand that in this stage you can create outages
Activate “leaves of the tree” first–BU uses this term to mean that order should be
• Interface card = leaf
• CIMC = twig
• IOM = branch
• UCSM = trunk
• FI = root
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 43
Activate Blade Components Recommended Method is to use Policies
–Host Firmware Policy to apply latest BIOS, Board Controller, Adapters, etc.
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 44
Activate Interface cards Set to “Set to startup version only”
If you uncheck above box it will cause a blade reboot!!!
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 45
Activate CIMC
CIMC can bet activated without disruption to OS on blade
KVM session will be lost while activating
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 46
Activate IOM
Same as Interface card “Set Startup Version Only”
IOM needs to be at same version as FI!!!
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 47
Activate UCSM
Will cause UCSM to disconnect
Takes a few minutes
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 48
Activate Fabric Interconnect
Recommended to activate one FI at a time
A complete outage will not occur Fail one fabric
Wait for all Network and FC traffic failover to second Fabric
Highly recommended to have an outage window Biggest risk is SAN storage FI will upgrade and reboot
Part of the process is to reboot connected IOM as well
Can take up to 10-15 minutes for FI and all IOM to come back online
If any failure during first FI upgrade STOP! Do not attempt to upgrade second FI
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 49
Activate Fabric Interconnect
Activate FI from Equipment tab
Upgrade subordinate first
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 50
Activate Fabric Interconnect
Choose correct Kernel and System Version
FI will take a few minutes and then reboot
IOMs will get updated as well
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 51
Verify Fabric Interconnect upgrade
Make sure IOM and FI all match the correct running version
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 52
Upgrade Primary Fabric Interconnect
Upgrade the Primary FI now using same process
UCSM will failover to subordinate FI
Will need to log back in to UCSM
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 53
Problems
Biggest concern is a failed IOM upgrade–There is no way in field to upgrade an IOM manually
–RMA the failed IOM
–Can attempt a physical reseat of IOM
Failed FI upgrade can be recovered–Similar to N5K will require access to console and tftp server to boot from
–Refer to FI recovery method
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 55
Host firmware
Highly recommended that Blade BIOS match running UCSM system
Best way to upgrade BIOS is through Host Firmware Policy
Create policy in UCSM
Apply policy to SP
Will reboot the blade so need outage window
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 56
Create Host Firmware Policy
From Server Tab
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 57
Host Firmware Policy
Note that Firmware Policy can include–Adapters, BIOS, Board Controller, FC Adapters, HBA Option ROM and Storage Controller
Note how Adapters and FC adapters can be part of a policy
–If adapters are part of policy then they can only be changed as part of firmware policy
Recommended to upgrade BIOS and Storage Controller at a minimum
Board adapter rarely changes and is specific to B230 and B440
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 58
Set BIOS versions
Best to choose all hardware
Set BIOS to the latest in the pull down for each blade/server
Latest BIOS version will be different for some servers
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 59
Add the new Firmware Policy to a SP Select the Host Firmware policy
Blade will reboot once you “Save Changes”
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 60
Additions / Modifications from version 2.1
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 61
– Firmware Auto Install
– Install Infrastructure Firmware
– Install Server Firmware
We just made it simple to upgrade
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 62
Firmware Auto-Install Firmware Auto-Install implements package version based upgrades for both UCS
Infrastructure components and Server components
Firmware Auto-Install can not be used to upgrade Management Extensions and Capability Catalog. These are simple occasional updates in UCSM and hence left under user control.
It is a two step process - “Install Infrastructure Firmware” and “Install Server Firmware”.
It is recommended to run “Install Infrastructure Firmware” first and then “Install Server Firmware”
All existing firmware upgrade mechanisms are retained. For users who do not want to use Auto-Install, they can continue to use existing documented way of doing firmware upgrades.
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 63
Install Infrastructure Firmware (contd) This is the sequence followed by “Install Infrastructure
Firmware”
1. Upgrade UCSM
2. Update backup image of all IOMs
3. Activate all IOMs with setstartup option
4. Activate secondary Fabric Interconnect
5. Wait for User Acknowledgement***
6. Activate primary Fabric Interconnect
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 64
Install Infrastructure Firmware GUI
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 65
Install Infrastructure Firmware - Cancelling
• A scheduled “Install Infra” operation can be cancelled
• But an “Install Infra” operation which is already “In Progress” can not be cancelled.
• Both GUI and CLI options are available for cancelling.
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 66
Install Infrastructure Firmware – User Acknowledgement for primary FI
• “Install Infra” expects an explicit permission from user to start firmware upgrade on primary Fabric Interconnect.
• This is necessary to protect the data path for servers.
• As part of “Install Infra”, secondary FI’s firmware is upgraded first.
• Secondary FI reboots as part of firmware activation.
• After secondary FI comes online, users are expected to check if the data path is ready for a reboot of primary FI
• When users have ensured that the data path is ready, they can acknowledge reboot of primary FI.
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 67
Acknowledge Primary FI reboot
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 68
Install Server Firmware
• Install-Server offers a way to update multiple host firmware packages using package versions.
• It provides the list of Service Profiles that will be affected when a host firmware package is modified. Multiple SPs can use the same host firmware package.
• It also provides a final summary of physical servers that will be rebooted for the set of host firmware packages that are getting modified.
• Only GUI is available for "Install Server Firmware". No CLI.
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 69
Install Server Firmware – Screen 1
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 70
Install Server Firmware – Screen 2
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 71
Install Server Firmware – Screen 3
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 72
Install Server Firmware – Screen 4
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 73
Install Server Firmware – Screen 5
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 74
Install Server Firmware – Screen 6
© 2010 Cisco Systems, Inc. All rights reserved. CAE BootcampPresentation_ID 75
Troubleshooting the Cisco Unified
Computing System
Chetan BadamiCisco TAC
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 76
AgendaTroubleshooting UCSM & Fabric Interconnect
Fault types
Clustering issues
Common issues
Blade Servers
IOM & Chassis
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 77
UCS System Components
UCS manager
UCS Fabric Interconnect (6xxx)
UCS Fabric Extenders (2xxx)
UCS 5100 Blade Chassis
UCS B-series servers
Nexus 2000 switch
UCS C-series servers
UCS Network adapters
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 78
UCS 6200 Fabric Interconnect (FI)
Standalone or ClusteredPrimary / Subordinate
Data Management Engine (DME)
FI-B#FI-A#
Virtual IP
IP #BIP #A
Management Network
Cluster links
DBDB
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 79
UCSM
UCSM GUI
CLIUCS-A# scope server x/y
NXOSUCS-A# connect nxos a
UCS-A(nxos)# show…
XML API
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 80
Fault TypesType Description
FSM An FSM task has failed to complete successfully, or Cisco UCS Manager is retrying one of the stages of the FSM.
equipment Cisco UCS Manager has detected that a physical component is inoperable or has another functional issue.
server Cisco UCS Manager cannot complete a server task, such as associating a service profile with a server.
environment Cisco UCS Manager cannot successfully configure a component.
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 81
Fault TypesType Description
management Cisco UCS Manager has detected a power problem, thermal problem, voltage problem, or loss of CMOS settings.
connectivity Cisco UCS Manager has detected a connectivity problem, such as an unreachable adapter.
Network Cisco UCS Manager has detected a network issue, such as a link down.
operational Cisco UCS Manager has detected an operational problem, such as a log capacity issue or a failed server discovery.
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 82
FarNorth-A# scope server ? WORD <chassis-id>/<blade-id> dynamic-uuid Dynamic UUID
FarNorth-A# scope server 1/1FarNorth-A /chassis/server # show event
Events per Component
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 83
UCSM Faults - GUI
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 84
Information Fault
Major Fault
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 85
Finite State Machine (FSM)
Workflow with many stages
Data Management Engine (DME)… Application Gateway (AG)
… End Point (EP)
<Object><Workflow><Operation><Where-is-it-executed>
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 86
Error Description for that stageStage Description
Operation (workflow)
FSM Details
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 87
Contexts UCS has three CLI “Contexts”
UCSM (GUI Equivalent, uses the “scope” command)
NXOS (not configurable – read only)
Management (file management, tech support, reboot)
UCSM
Local-ManagementNXOS
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 88
Scope Scoping – movement to different UCS configuration components
Details on hardware components done with connect command
You want to be on the
Primary Fabric Interconnect
UCS-B# scope ? adapter Mezzanine Adapter chassis Chassis eth-server Ethernet Server Domain eth-storage Ethernet Storage eth-traffic-mon Ether Traffic Monitoring Domain eth-uplink Ethernet Uplink fabric-interconnect Fabric Interconnect fc-storage FC Storage fc-traffic-mon FC Traffic Monitoring Domain fc-uplink FC Uplink fex FEX (fabric-extender) Module firmware Firmware host-eth-if Host Ethernet Interface host-fc-if Host FC Interface license License monitoring Monitor the system org Organizations power-cap-mgmt Power Cap Mgmt security security mode server Server service-profile Service Profile system Systems vhba vHBA vnic vNIC
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 89
Connect - Hardware Troubleshooting
FarNorth-B# connect
adapter Mezzanine Adapter bmc Baseboard Management Controller (CIMC) clp Connect to DMTF CLP iom IO Module local-mgmt Connect to Local Management CLI nxos Connect to NXOS CLI
Connect – attaches you to hardwareand read only NXOS
FarNorth-A# connect local-mgmt <CR> a Fabric A Defaults to primary b Fabric B
FarNorth-A(local-mgmt)# ? cd Change current directory clear Reset functions cluster Cluster mode connect Connect to Another CLI copy Copy a file cp Copy a file delete Delete managed objects dir Show content of dir enable Enable end Go to exec mode erase Erase erase-log-config Erase the mgmt logging config file exit Exit from command interpreter install-license Install a license ls Show content of dir mkdir Create a directory move Move a file mv Move a file ping Test network reachability pwd Print current directory reboot Reboots Fabric Interconnect rm Remove a file rmdir Remove a directory run-script Run a script show Show running system information ssh SSH to another system tail-mgmt-log Tail mgmt log file telnet Telnet to another system terminal Set terminal line parameters top Go to the top mode traceroute Traceroute to destination
Most dangerous
-erase configuration - reboot
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 90
Connect NXOS
Used to assist in troubleshooting – very familiar to IOS and Nexus - all the show commands
Used to run advised debugs – By TAC
Commands:–Show switch running config (non server config)
–Clear interface counters found on the FI
Cannot be used to configure UCS (read only)
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 91
Connect to NXOSFarNorth-A# connect nxos ? <CR> a Fabric A b Fabric B
Popular examples:
show runshow fex detailshow interfaceshow lacpshow trunkshow cdpdebugshow npv flogi-tableshow mac-address-table
FarNorth-A(nxos)# ? clear Reset functions [Only place to clear counters] cli CLI commands debug Debugging functions debug-filter Enable filtering for debugging functions ethanalyzer Configure cisco packet analyzer interface A live capture will start on following interface no Negate a command or set its defaults ntp NTP configuration show Show running system information system System management commands terminal Set terminal line parameters test Test command undebug Disable Debugging functions (See also debug) end Go to exec mode exit Exit from command interpreter pop Pop mode from stack or restore from name push Push current mode to stack or save it under name where Shows the cli context you are in
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 92
UCSM – Common issues
Is the other FI up and operational?
Are clustering links up?
Is there at least 1 chassis successfully discovered on both FIs?
UCS-A# show cluster extended-state
UCS-A# show pmon state
UCS-A(local-mgmt)# cluster lead a
UCS-A(local-mgmt)# cluster force primary
UCS-A /monitoring/sysdebug # show cores
DME Clustering problems
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 93
Sample – Cluster state
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 94
Sample – Process state (pmon)
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 95
Agenda
Troubleshooting UCSM & Fabric Interconnect
Blade Servers
CIMC/BIOS
OBFL/SEL
IOM & Chassis
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 96
Blade serversBlade overview – Hardware & Software Components
CPU& Heatsink
Memory DIMMS
MezzanineAdapter
CIMC
HDD
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 97
Blade servers
CIMC– Monitors Temperature and Power readings
– KVM & vMedia
– Blade control
BIOS– Can be configured via F2 or via BIOS Policy
Blade overview – CIMC and BIOS
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 98
OBFL
Onboard Fault Log stores hardware logs on the different components, saved at time of issue.
Alternate method to viewed by connecting to the internal component end device.
Show tech-support will capture required logs for support.
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 99
System Event Log (SEL) - Events Supported Server BIOS events
3 Kinds of equipment end-points:
Memory Unit (DIMM) ECC errors, Address Parity, Memory Mismatch
Processor Unit Memory Mirroring, Sparing, SMI Link errors
Motherboard PCIe, QPI uncorrectable errors, Legacy PCI errors
All these errors are modeled as stats properties. The ones for which thresholds are not defined get reported as statistics only
BMC, BIOS, OS log platform errors to CIMC’s System Event Log (SEL) Buffer POST and Run Time errors Used as an Effective health monitoring tool
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 100
System Event Logs Make sure that servers are discoveredMake sure backup destination path is validCan be done via CLI alsoSystem Event Logs = Management Logs on earlier releases
Chassis
Server
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 101
Corrupt CIMC Firmware
Post Failure
Not Completing boot
Connect to CIMC in band manager to diagnose
View Logs, collect tech-support, Monitor KVM output
Manually reboot CIMC
Fault codes: http://www.cisco.com/en/US/partner/docs/unified_computing/ucs/ts/faults/reference/ErrMess.html
CIMC Booting Problems - Blades
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 102
Connecting to CIMC Debug Utility To verify health of blade if questioning
UCSM and wanting to look at lowest level of Blade data points
Used to determine blade components issues at the source.
UCS-A# connect cimc 1/1Trying 127.5.1.1...Connected to 127.5.1.1.Escape character is '^]'.
CIMC Debug Firmware Utility Shell
____________________________________ Debug Firmware Utilityalarmscoresexithelp [COMMAND]imagesmctoolsmemorymessagesnetworkobflpostpowersensorsselfrumezz1frumezz2frutaskstopupdateusersversion
Chassis 1 Server 1 Motherboard CIMC
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 103
Blade servers – Common issues
Server discovery failed– Check minimum software version
– Reseat blade
– Minimum hardware satisfied?
No KVM Video– Does the CIMC have an IP?
Is the BIOS corrupt?– Recover BIOS
– Reset CMOS
UCS-A# show version
UCS-A /system # show capability
UCS-A /chassis/server/cimc # show mgmt-if
UCS-A /chassis/server # show post
UCS-A /chassis/server # reset-kvm
UCS-A /chassis/server # recover-bios <file>
UCS-A /chassis/server # reset-cmos
CIMC issues
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 104
Blade servers – Common issues
Blade won’t boot– Did POST complete?
Types of DIMM errors – Mapped out
– Disabled
– Inoperable
– Degraded
UCS-A# connect cimc x/y
[ help ] # post
[ post ] # obfl
[ obfl ] # sel
UCS-A /chassis/server # show memory
[detail]
UCS-A /chassis/server/memory-array/dimm
# show stats memory-error-stats detail
Hardware issues
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 105
Blade servers – Common issues
Service profile modifications– Firmware updates
– Configuration changes
OS initiated
Hardware issue
IOM / FI issues
Use Maintenance policies to defer changes
Check OS
Unexpected reboot
UCS-A /chassis/server# show fsm status
UCS-A# connect cimc x/y
[ help ] # post
[ post ] # obfl
[ obfl ] # sel
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 106
Blade servers – Top 5 commands
UCS-A /chassis/server # show inventory expand detail
UCS-A /chassis/server # show status detail
UCS-A /chassis/server # show post
UCS-A /chassis/server # show sel
UCS-A /chassis/server# show fsm status
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 107
Agenda
Troubleshooting UCSM & Fabric Interconnect
Blade Servers
IOM & Chassis
Discovery issues
Fan/Thermal/PSU
Tech-support
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 108
IOM & Chassis
CMC responsibilities– Chassis Discovery
– Local cluster management
– Power & Thermal Management
Overview
ChassisManagement
Controller
FLASH
EEPROM
DRAM
Control
IO
ChassisSignals
Switch
1 - 4Fabric linksToInterconnect
To Blades
ASIC
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 109
IOM & Chassis – Common issues
Check chassis discovery policy
Server ports defined correctly
FI to IOM 1:1 relationship only
UCS-A(nxos)# show run interface
ethernet x/y
UCS-A(nxos)# show interface fex-fabric
UCS-A(nxos)# show fex <chassis#> detail
Chassis not discovering
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 110
IOM & Chassis – Common issues
Spinning at 100% – Temperature
– Any fans missing?
– CMC access to thermal sensors
– Component discovery
UCS-A# connect iom 1
fex-1# show platform software cmcctrl thermal status
fex-1# show platform software cmcctrl fancontrol all
fex-1# show platform software cmcctrl ohms all
Fan issues
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 111
Logs for troubleshooting
General UCS issues UCS-A(local-mgmt)# show tech-support ucsm detail
UCS-A(local-mgmt)# show tech-support chassis # all detail
Networking Issues Upstream_Switch# show tech-support details
SAN Issues UCS-A(nxos)# show tech-support npv
MDS# show tech-support details
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 112
UCSM and Chassis show tech from GUI Log into the UCSM GUI
Select the admin tab -> faults, Audit and event-logs section -> Tech Support File
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCOM-3001 113
Where to find more information
Hardware Installation & Service Guides Information http://www.cisco.com/en/US/docs/unified_computing/ucs/overview/guide/UCS_roadmap.html#wp38892
Release Notes http://www.cisco.com/en/US/products/ps10281/prod_release_notes_list.html
Software Upgrade & Installation Information http://www.cisco.com/en/US/products/ps10281/prod_installation_guides_list.html
UCS Troubleshooting Guide http://www.cisco.com/en/US/docs/unified_computing/ucs/ts/guide/UCSTroubleshooting.html
UCS Faults Reference http://www.cisco.com/en/US/docs/unified_computing/ucs/ts/faults/reference/ErrMess.html
Cisco Support Community https://supportforums.cisco.com/community/netpro/data-center/unified-computing
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 115
Upcoming Sessions…..
Process
Technology
CiscoSupport
Community
March “Month of Routing Protocol Technology” • Session 9 – 11th Mar 2014• Session 10 – 25th Mar 2014
April “Month of Wireless Technology”
And many more……Months and Technologies