faqs : active iq - docs.netapp.com · does risk mitigation require system downtime? some risks may...

12
FAQs Active IQ NetApp May 01, 2020 This PDF was generated from https://docs.netapp.com/us-en/active-iq-1/reference_aiq_srd_health_summary.html on May 01, 2020. Always check docs.netapp.com for the latest.

Upload: others

Post on 02-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FAQs : Active IQ - docs.netapp.com · Does risk mitigation require system downtime? Some risks may be safely corrected without any interruption to system availability while others

FAQsActive IQNetAppMay 01, 2020

This PDF was generated from https://docs.netapp.com/us-en/active-iq-1/reference_aiq_srd_health_summary.html onMay 01, 2020. Always check docs.netapp.com for the latest.

Page 2: FAQs : Active IQ - docs.netapp.com · Does risk mitigation require system downtime? Some risks may be safely corrected without any interruption to system availability while others

Table of ContentsFAQs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  1

Health summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  1

Reporting and mitigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  2

FAQs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  5

Page 3: FAQs : Active IQ - docs.netapp.com · Does risk mitigation require system downtime? Some risks may be safely corrected without any interruption to system availability while others

FAQs

Health summary

What is the purpose of the health summary section?

Health Summary section proactively identifies risks in deployed NetApp® storage configurations thatcan negatively affect system performance, availability, and resiliency. Each risk entry containsinformation about the specific risk to the system, potential negative effects, and links to risk mitigationplans. By addressing identified risks proactively, you can significantly reduce the possibility ofunplanned downtime for your NetApp storage system.

What is the access policy for this health summary module?

Like the rest of Active IQ, this module, too, is accessible to all customers whose systems are covered bya valid hardware warranty contract, with AutoSupports enabled.

Is there a requirement to correct risks that are identified?

NetApp recommends resolving identified risks within suggested time frames to avoid adverse systemimpacts. A severity with the recommended time frame in which the resolution should be implementedis included in details of each risk: for example, immediately, next scheduled maintenance, and so on.Not resolving identified risks increases your chance of encountering system issues that would havebeen avoidable if corrective measures was taken.

Is a support case automatically opened for identified risks?

No, cases are not automatically opened for risks.

What are the system hardware and software requirements?

The following are the software and hardware requirements for system risk analysis:

• AutoSupport enabled

• ONTAP and E-Series based systems

Are all risks to the system identified?

For functionality that is targeted for risk identification, there are exceptions as follows:

• Risks that cannot be identified by analysis of AutoSupport logs are not included. For example, clientapplication configuration parameters not controlled by ONTAP.

• Risks that do not currently have a risk “signature”, a code for detecting the risk in the AutoSupport,are not included. NetApp is continually adding new risk signatures to expand coverage of identifiedrisks.

Page 4: FAQs : Active IQ - docs.netapp.com · Does risk mitigation require system downtime? Some risks may be safely corrected without any interruption to system availability while others

Reporting and mitigation

How are risks identified for a system?

Risks are identified by an automated analysis of the most recent AutoSupport received from a system.

Why does my mitigated risk still show up after I fixed it?

All risks are identified based on the most recent AutoSupport. As a result, any risks that are mitigatedwill not be reflected until a new AutoSupport log is received for the system. You can trigger a completeAutoSupport manually if you are interested to see results refresh in Active IQ faster. Currently, it cantake up to 24 hours for results to refresh on Active IQ after receipt of an AutoSupport.

Are any of the risk items self-correcting?

No. Risks that are identified are persistent risks that will not self-correct. Planned manual interventionis required in order to mitigate risks.

Does risk mitigation require system downtime?

Some risks may be safely corrected without any interruption to system availability while others mightrequire planned downtime. The information under “Corrective Actions” and/or your NetApp supportrepresentative will make recommendations on correct procedures to follow. Risk severity is a goodindication of the urgency that exists around mitigating the identified risk.

What does the impact level indicate?

Impact Level is based on Potential Impact.

Factor Description

Impact Level Impact Level assesses the capability of the systemto continue operation without suffering apotential outage. For example, a high impact levelindicates urgency, and immediate action shouldbe taken to mitigate the risk, whereas a lowimpact level can wait until the next scheduledmaintenance window.

Potential Impact Potential Impact explains what may occur if therisk identified is not mitigated. For example, a lowimpact risk might not affect system availabilityand only generate frequent console messages,whereas a high impact will most likely result inunplanned system downtime.

Impact can be high, medium, low and Best Practice and always considers the Potential Impact. The

Page 5: FAQs : Active IQ - docs.netapp.com · Does risk mitigation require system downtime? Some risks may be safely corrected without any interruption to system availability while others

Potential Impact is displayed in the details field of the risk.

Where can I find the steps needed to mitigate a risk?

The Corrective Action field in the risk details page contains links to customer support bulletins (CSBs),Public Report for bugs or knowledge base (KB) articles that cover risk mitigation plans. In someinstances you might see a mitigation difficulty indication listed in the CSB or KB article.

What types of risks are detected?

The number of risks that can be detected is regularly increasing. Risks generally fall within thefollowing categories:

Category Description

Hardware Failures System is found to have failed or degradedhardware components. This covers platform,storage, disk drive, and HA related risks.

Non-supported Configurations System is found to violate restrictions documentedin NetApp documentation, such as the systemconfiguration guides. For example, cards installedin unsupported slots in the controller.

Resource Depletion System is found to have significant resourcedepletion. For example, no spare disks.

Nearing or exceeding operational limits The system is found to be nearing or exceedingoperational or upgrade limits. For example,exceeding flexible volume limits that result in thesystem falling outside of non-disruptive upgradecapabilities.

Customer Support Bulletins (CSBs) The system is found to match a condition relatedto a CSB. For example, hardware that has isoperational but falls under end of support (EOS).

Best practice misalignment The system configuration is misaligned withNetApp best practices. Although NetApp highlyrecommends aligning with best practices, thereare exceptions that might be warranted forspecific configurations. As a result, some of thesetypes of risks might not need mitigation.

What information is reported for each risk?

Five fields are reported for each risk identified on the system. They are:

Page 6: FAQs : Active IQ - docs.netapp.com · Does risk mitigation require system downtime? Some risks may be safely corrected without any interruption to system availability while others

Field Description

Impact Level The severity the risk can have to the system.

Category See section 2.7 for more information aboutcategories.

Risk The short description or title of the risk identified.

Details A more detailed description of specific issue,severity, and potential impact to the system.

Corrective Action Links to documentation that is used for riskmitigation such as CSBs and KB articles.

Risks are reported based on AutoSupport data that is sent to NetApp. Risks are identified per system soyou will know exactly which system is experiencing the risk.

Why should I acknowledge a risk and how do I do it?

Some risks may not apply to a specific customer environment because of the nature of the applicationor the system may be in a certain stage in the lifecycle in which risks may not matter. Also, in certainsituations, customers may plan to mitigate certain risks periodically through regularly scheduledmaintenance windows. However, irrespective of the situation, it is an operational best practice toacknowledge a risk in order to look at the true health of your installed base.

Follow the steps below to acknowledge a risk:

• Click the Health summary tab from left navigation.

• Identify the risk you wish to tag, and then click on the acknowledge flag.

• Select systems for which you want to acknowledge the risk.

• Fill in the Approved By and Justification fields.

• Acknowledge the risk by clicking the Acknowledge button at the bottom of the dialogue box.

How can I get a regular update on my system risks?

The best way to keep yourself updated on risks in your installed base is to schedule a regular riskreport. You can click the Schedule a Risk Report from the Health Summary tab or navigate to theReports tab on the top menu of Active IQ to schedule a regular risk report.

You can schedule a report by risk impact at a frequency and format (PDF, PPT and XLS) of your choice.This allows you to see risks easily without having to visit the Active IQ portal.

Is the risk information available in the Active IQ mobile app?

Yes, system risk information is available in the Active IQ mobile app. You can download the mobile app

Page 7: FAQs : Active IQ - docs.netapp.com · Does risk mitigation require system downtime? Some risks may be safely corrected without any interruption to system availability while others

from the following locations:

iOS - https://itunes.apple.com/us/app/my-autosupport/id1230542480?ls=1&mt=8Android - https://play.google.com/store/apps/details?id=com.netapp.myautosupport

FAQs

How do I access the AFF Efficiency in Active IQ?

1. Open home page of Active IQ.

2. Search for a cluster from the top right search box to reach the cluster dashboard. The efficiencyportlet on the dashboard displays the top-level ratios. Click to see further details.

Does Active IQ display ratios for all AFF systems?

Active IQ displays ratios for All-Flash systems running ONTAP 8.3.2 and later.

What is the ‘Without Snapshots’ checkbox in the AFF efficiency dashboard?

By default, Active IQ AFF Efficiency Dashboard calculates overall ratio at cluster-level, node-level,and aggregate-level. Overall ratio includes the ratio from the following storage efficiency technologies:

• Deduplication

• Compression

• Data compaction

• Snapshots and

• Clones

By selecting Without Snapshots checkbox, the tool will calculate data reduction ratio (ratio fromdeduplication, compression, data compaction and clones storage efficiency technologies) at cluster-level, node-level, and aggregate-level.

This option is provided to support customers who have signed up for NetApp storage guaranteeprogram that guarantees x:1 data reduction ratio based on customer-workload. Customers can useActive IQ Storage Efficiency dashboard to monitor data reduction ratios during the guarantee period.

To know more about NetApp Storage Guarantee program, refer All-Flash Guarantee.

Page 8: FAQs : Active IQ - docs.netapp.com · Does risk mitigation require system downtime? Some risks may be safely corrected without any interruption to system availability while others

How are aggregate-level, node-level and cluster-level ratios calculated? Mycustomer’s systems are behind a secure DMZ/CMZ and cannot sendAutoSupport to NetApp. How do I calculate the ratios for those systems?

This is how we calculate different ratios. You can follow these steps to calculate the ratios manually

1. Aggregate-level ratios

Aggregate Overall Ratio

Overall ratio at aggregate-level are directly obtained from ONTAP using ZAPI. They can also beobtained from aggr-efficiency.xml section in AutoSupport.

For systems that do not have access to AutoSupport, ‘aggregate show-efficiency …’ CLI commands canbe used.

Aggregate Data Reduction Ratio

The formula to calculate aggregate-level data reduction ratio is as follows:

2. Node-level and Cluster-level ratios

Node/Cluster Overall Ratio

Active IQ uses data from aggr-efficiency.xml output to calculate node/cluster- level overall ratio.

For systems that do not have access to AutoSupport, you can use data from ‘aggregate show-efficiency–advanced’ to calculate node/cluster-level overall ratio.

Follow the steps below to calculate Node/Cluster overall ratio:

1. Sum ‘Total/Cumulative Logical Used’ and ‘Total/Cumulative Physical Used’ for all the aggregates inthe node/cluster to get ‘Node/Cluster Logical Used’ and ‘Node/Cluster Physical Used’ respectively.

2. Divide ‘Node/Cluster Logical Used’ by ‘Node/Cluster Physical Used’ to get node/cluster datareduction ratio.

Node/Cluster Data Reduction Ratio

Active IQ uses the following steps to calculate node/cluster-level overall ratio.

1. Calculate ‘Data Reduction Logical Used’ and ‘Data Reduction Physical Used’ for all the aggregates inthe node/cluster using the formula mentioned in Aggregate Data Reduction Ratio.

2. Sum ‘Data Reduction Logical Used’ and ‘Data Reduction Physical Used’ for all the aggregates in thenode/cluster to get ‘Node/Cluster Data Reduction Logical Used’ and ‘Node/Cluster Data ReductionPhysical Used’ respectively.

Page 9: FAQs : Active IQ - docs.netapp.com · Does risk mitigation require system downtime? Some risks may be safely corrected without any interruption to system availability while others

3. Divide ‘Node/Cluster Data Reduction Logical Used’ by ‘Node/Cluster Data Reduction Physical Used’to get node/cluster data reduction ratio.

Which sections of AutoSupport are used for determining the efficiency ratiosand how do I view the section?

The calculator leverages the aggr-efficiency.xml section in AutoSupport for ONTAP 9.x systems tocalculate the node, cluster, and aggregate level ratios. This section contains efficiency information ofthe node the AutoSupport is transmitted from and its HA pair. In ONTAP 8.3.2 systems such a section isnot available and so the calculator leverages various other sections in AutoSupport to arrive at theratios, but the approach is the same as ONTAP 9.x

For the volume level ratio calculations, we use the df –s section of AutoSupport. Volume levelcalculations are arrived at using the following formula:

Vol [n] (Eff ratio) = [ df-s (used) + df-s (saved) ] / df-s (used)

Volume level ratios only include savings contributions from deduplication andcompression and may not add up to the node level ratios.

These AutoSupport sections are viewable from the Raw AutoSupport Data tab in left navigation ofcluster dashboard of Active IQ. Remember to view a weekly or a user triggered AutoSupport.

Which AutoSupports are used for calculating efficiency ratios?

Calculations are performed using either the latest weekly or user-triggered AutoSupports which tendto contain most of the sections required for calculating the ratios.

Which volumes or aggregates are excluded from efficiency calculations?

Following objects are NOT considered while calculating AFF efficiency ratios:

• Root aggregates

• Offline volumes

• Vserver root/admin root volumes

• MCC configuration volumes

Why do my displays look different in my laptop vs. a smartphone?

The AFF storage efficiency calculator UI is optimized for viewing in smartphones. Although there maybe small differences in display, the data and content of the calculator is same across devices.

Page 10: FAQs : Active IQ - docs.netapp.com · Does risk mitigation require system downtime? Some risks may be safely corrected without any interruption to system availability while others

How can I see the efficiency ratios of all my AFF systems in a single viewwithin Active IQ?

Currently, efficiency ratios are only visible at a cluster level. Customer level views may be consideredfor a future release.

How can I see the trend in efficiency ratios?

Currently, efficiency ratios are based on the latest weekly or user-triggered AutoSupport. Efficiencytrending may be considered for a future release.

How are customer-level ratios and efficiency savings calculated?

Customer level storage efficiency dashboard provides the efficiency ratio with and without Snapshotcopies for AFF and non-AFF systems and are combined across the customer install base for systemsrunning ONTAP 9.1 and later. The required parameters, for the following calculations, are taken fromONTAP AutoSupport:

Without Snapshot copies (calculated for per Aggr first):

• Aggr Logical Used without Snapshot copies = Logical Size Used by Volumes, Clones, SnapshotCopies in the Aggregate – Logical Size Used by Snapshot Copies

• Aggr Physical Used Without Snapshot copies = Total Physical Used – Physical Size Used bySnapshot copies

• Customer Efficiency Ratio without Snapshot copies = Sum [Aggr Logical Used withoutSnapshot copies for all aggregates and for all nodes of a customer] / Sum [Aggr Physical Usedwithout Snapshot copies for all aggregates and for all nodes of a customer] : 1

With Snapshot copies:

• Customer Logical Size with Snapshot copies = Sum [Logical Size Used by Volumes, Clones,Snapshot copies for all aggregates and for all nodes of a customer]

• Customer Physical Size Used with Snapshot copies = Sum [Total Physical Size Used for allaggregates and for all nodes of a customer]

• Customer Efficiency Ratio with Snapshot copies = Customer Logical Size with Snapshot copiesand Clones / Customer Physical Size Used with Snapshot copies and Clones : 1

Efficiency feature table calculations:

• Total Physical Space Used:

◦ Customer Physical Space Used = Sum of Physical Space Used by the Aggregate for allaggregates and of all nodes of a customer.

• Total Logical Used:

Page 11: FAQs : Active IQ - docs.netapp.com · Does risk mitigation require system downtime? Some risks may be safely corrected without any interruption to system availability while others

◦ Customer Logical Size Used without Snapshot copies = Sum of Logical Size Used byVolumes, Clones, Snapshot Copies - Logical Size Used by Snapshot copies for all aggregatesof all nodes of a customer

◦ Customer Logical Size Used with Snapshot copies = Sum of Logical Size Used by Volumes,Clones, Snapshot Copies in the Aggregate for all aggregates of all nodes of a customer

• Total Space Saved = Total Logical Space Used – Total Physical Space Used

• Deduplication Savings: Sum of Space Saved by Volume Deduplication + Space Saved by InlineZero Pattern Detection of each aggregate of all nodes of a customer.

• Compression Savings: Sum of Space Saved by Volume Compression of each aggregate of allnodes of a customer.

• Compaction Savings: Sum of Space Saved by Aggregate Compaction of each aggregate of allnodes of a customer.

• FlexClone Savings: Sum of (Logical Size Used by FlexClone Volumes - Physical Sized Used byFlexClone Volumes) of all aggregates of all nodes of a customer.

• Snapshot copies Backup Savings: Sum of (Logical Size Used by Snapshot copies - Physical SizeUsed by Snapshot copies) of all aggregates of all nodes of a customer.

How do I provide feedback or ask other questions related to the calculator?

For feedback or questions, send an email to [email protected]

Page 12: FAQs : Active IQ - docs.netapp.com · Does risk mitigation require system downtime? Some risks may be safely corrected without any interruption to system availability while others

Copyright Information

Copyright © 2020 NetApp, Inc. All rights reserved. Printed in the U.S. No part of this documentcovered by copyright may be reproduced in any form or by any means-graphic, electronic, ormechanical, including photocopying, recording, taping, or storage in an electronic retrieval system-without prior written permission of the copyright owner.

Software derived from copyrighted NetApp material is subject to the following license and disclaimer:

THIS SOFTWARE IS PROVIDED BY NETAPP “AS IS” AND WITHOUT ANY EXPRESS OR IMPLIEDWARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OFMERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, WHICH ARE HEREBYDISCLAIMED. IN NO EVENT SHALL NETAPP BE LIABLE FOR ANY DIRECT, INDIRECT,INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOTLIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, ORPROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OFLIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OROTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OFTHE POSSIBILITY OF SUCH DAMAGE.

NetApp reserves the right to change any products described herein at any time, and without notice.NetApp assumes no responsibility or liability arising from the use of products described herein,except as expressly agreed to in writing by NetApp. The use or purchase of this product does notconvey a license under any patent rights, trademark rights, or any other intellectual propertyrights of NetApp.

The product described in this manual may be protected by one or more U.S. patents,foreign patents, or pending applications.

RESTRICTED RIGHTS LEGEND: Use, duplication, or disclosure by the government is subject torestrictions as set forth in subparagraph (c)(1)(ii) of the Rights in Technical Data andComputer Software clause at DFARS 252.277-7103 (October 1988) and FAR 52-227-19 (June 1987).

Trademark Information

NETAPP, the NETAPP logo, and the marks listed at http://www.netapp.com/TM are trademarks ofNetApp, Inc. Other company and product names may be trademarks of their respective owners.