exadata performance troubleshooting methodology · title: doag_2016_presentation_final author: jim...

48

Upload: hoangtruc

Post on 11-Jul-2018

229 views

Category:

Documents


1 download

TRANSCRIPT

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Exadata Performance Troubleshooting Methodology

James Viscusi Consulting Member of Technical Staff

Andrew Bulloch Architect

Server Technologies - Maximum Availability Architecture Team

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Safe Harbor Statement

THE FOLLOWING IS INTENDED TO OUTLINE OUR GENERAL PRODUCT DIRECTION. IT IS INTENDED FOR INFORMATION PURPOSES ONLY, AND MAY NOT BE INCORPORATED INTO ANY CONTRACT. IT IS NOT A COMMITMENT TO DELIVER ANY MATERIAL, CODE, OR FUNCTIONALITY, AND SHOULD NOT BE RELIED UPON IN MAKING PURCHASING DECISIONS. THE DEVELOPMENT, RELEASE, AND TIMING OF ANY FEATURES OR FUNCTIONALITY DESCRIBED FOR ORACLE'S PRODUCTS REMAINS AT THE SOLE DISCRETION OF ORACLE.

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

The questions

How do I monitor my Exadata environment? What parameters are most important?

What thresholds do I set to monitor my Exadata using Enterprise Manager?

How do I diagnose a performance problem involving Exadata?

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Agenda

Level Setting – Exadata

What to do before problems occur?

What do we do when problems occur?

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Exadata Architecture

1

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

What is Exadata?• First and foremost Exadata is a platform to run Oracle databases in a

highly available and performant manner

• The hardware and software stack are tightly integrated. The components are tested by Oracle and work together, making the solution extremely performant.

• Every generation of Exadata is designated as X2, X3, X4, etc. The current naming standard is iterative and increases one number each hardware release. Available on Intel of SPARC chipsets

• The second part of the name is either a -2 or -8. These indicate the number of sockets on each compute node

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

What makes up an Exadata Database Machine?

• Storage/Cell Servers

• Compute/Database Servers

• Infiniband Switches

• Ethernet Switch

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

What do we do before problems occur?

• Configure Enterprise Manager metric extensions • Understand Key Performance Indicators (KPIs) for Exadata • View KPIs using Systems and Services in Enterprise Manager • Configure Adaptive Thresholds in EM 13 for Exadata KPIs

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

What are Metrics in Enterprise Manager?• A metric is a stored piece of information used to monitor a target

Type of Metrics – Metrics can be information collected by the EM Agent – Derived from information stored in the repository

• Metric Extensions – Metrics that are custom defined by users.

• Can be server side or repository side

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Metrics and Thresholds• Enterprise Manager has a comprehensive set of metrics that allow

thresholds to be defined on all target types. – Thresholds Allow for alerting if a chosen metric crosses a certain value

• Server ( Compute Node) Metrics – monitored as any other host target (memory, i/o , CPU, network )

• Cell Server Metrics – Creates incidents on all alerts received from the cell(SNMP Based)

• Database Metrics – Database Time Spent Waiting, Throughput, Efficiency

– One problem – Enterprise Manager monitors many metrics!

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Key Performance Indicators• What Is a KPI?

– A quantifiable measurement used to determine server health or performance

• Defined a set of KPIs – Compute Nodes – Storage Servers – Infiniband Switches

• KPIs are defined and explained in: – http://www.oracle.com/technetwork/database/availability/exadata-storage-server-

kpis-2855362.pdf – Also reference MOS Note 2094648.1

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Compute and Infiniband KPIs

Compute Nodes

• CPU Utilization • Memory Utilization • Load Average • Swap Utilization

Infiniband Switches

• CPU Usage • Memory Percent Used • Root filesystem usage • SSH Session Count

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Storage Server Key Performance Indicators• Use Metric Extensions to create compound

metrics

• KPIs for a storage Storage Server aggregate read and write data – Create Metric Extensions (again in MOS 2094648.1)

• Disk IOPS • Disk Throughput • Response Time • I/O Load • Cell Health

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Enterprise Manager Services • Metric Extensions with Services allows a holistic view of the

storage grid – incidents will be created whenever warning or critical thresholds are

crossed

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

However… (another often asked question)• Using thresholds in Enterprise Manager allows users to be alerted in

the event metrics show an issue – i.e CPU usages exceeds a specified amount

• KPIs do not have universal values. They can differ depending on many things – Customer Requirements – Environment Usage

• Defining one set of thresholds that works for every customer/environment is not feasible

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Adaptive Thresholds (new in EM 13)• Use the collected metrics to make a data driven recommendation

for each specific system – Analyze the data over a 1-4 week window

• Not all metrics are eligible (but the ones we need are!) • Two methods of collecting the data from the paper

– Dynamic – Guided

• Companion Paper to the KPI paper – http://www.oracle.com/technetwork/database/availability/exadata-

adaptive-thresholds-3102556.pdf

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Customizing Adaptive Threshold collections

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Adaptive Threshold Data analytics

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Adaptive Threshold Final Setting

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

AWR Baselines

• Collection of snapshots used for performance comparisons.

• Baselines are retained within the AWR even after the retention time for the data has been reached.

• Exadata should have a moving and a static baseline in place to capture different workloads.

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

What to do when problems occur?

• Review

• Rule Out Hardware

• Compare

• Drill down

2

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Checklist can be very useful!!

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Rule Out- Hardware

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Check the Obvious

DB Machine Home Page • Contains a lot of good information at a quick glance

• Incident Manager • Alert Logs • Grid Infrastructure • ASM • Databases

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Incident Manager

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Incident Manager(Contd.) – maybe drop this slide?

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Compare

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

What Changed?If there is a problem what has changed? And who might know?

Considerations • Patch levels (everywhere!) • Schema • Tunable OS parameters • Resource Management Plans • Code Changes • ADDM Comparison Report

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Compare Configurations- Exadata Level

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Compare Configuration- Database Level

• EM Job to compare one ‘reference’ database against one or more other databases

• Job can be scheduled on a repetitive basis, or run ad-hoc

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Compare Configurations- Schema Level

Capture Schema baselines

Compare schema’s

• With a baseline

• Between different databases

Synchronize schema’s

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Engineered Systems Health checks - OraChk

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

So, where are we?

Ruled out hardware issues….

Rule out configuration changes ….

So… drill down into the hardware and running SQL

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Storage Grid Overview

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Storage Grid Performance

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Max Cell Disk Limits

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Hardware- DB Node View

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

What would I do next?

3

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Drill down Paths

After all changes, including any environments are ruled out…

• Start with AWR, ADDM and ASH

• Identify outliers / worst performing (Pareto)

• Review top wait events

• SQL Tuning Advisor

• If resource constrained consider resource limiting strategies

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Compare ADDM

• Full ADDM analysis across two AWR snapshot periods • Detects causes, measure effects, then correlates them ➢ Causes : Workload changes, Configuration changes ➢ Effects : Regressed SQL, Reach resource limits (CPU, I/O, memory, interconnect)

• Makes actionable recommendations along with quantified impact

AWR Snapshot Period 1

AWR Snapshot Period 2

Analysis ReportCompare Period ADDM

SQL Commonality

Regressed SQL

I/O Bound

Undersized SGA

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

New Exadata AWR enhancements

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

SQL Advisors• SQL Tuning Set

– Capture of SQL in the database – Ideally run after a representative workload has been run

• Access Advisor – Analyses access patterns of SQL in the cache, or from a defined workload,

and gives recommendations on how to (re)structure the database objects • Performance Analyzer

– Analyses before and after images of SQL in a SQL Tuning Set (Testing scenarios)

– Compares results • Tuning Advisor

– Requires Tuning Sets (Collection of SQL from the database) – Analyses the SQL from a SQL Tuning Set and gives recommendations

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

IORM

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

SummaryHow do monitor my Exadata environment? What parameters are most important?

KPIs for Exadata Compute, Cells and Infiniband Switches

What thresholds do I set to monitor my Exadata using Enterprise Manager?

Enterprise Manager 13 - Adaptive Thresholds

How do I diagnose a performance problem involving Exadata?

Incident Review, Baseline comparison, Drill Down

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Want to Learn More?

Whitepapers and Links:

http://www.oracle.com/goto/maa (Enterprise Manager)

Engineered Systems Manageability Section

http://www.oracle.com/technetwork/database/availability/exadata-health-resource-usage-2021227.pdf

-and others…

https://blogs.oracle.com/EMMAA/

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Contact Me

Email: [email protected]

Twitter: @jviscusi

LinkedIn: Jim Viscusi