deconstructing the teradata explain output

3
Deconstructing the Teradata explain output If you are new to Teradata , performance optimization of a single SQL statement probably seems to be quite challenging and you may not know where to start from... Happily, we have a very useful tool which gives us valuable information needed for performing this task. The EXPLAIN statement. Together with some basic knowledge about how SQL statements are executed on the system you will see that in a short time you are able to understand each single query running on your system and can track down performance issues in a short time (however, solving them is another challenge) Although you may feel quite overextended when interacting for the first time with the explain tool, all you need to do is to deconstruct the parts making up the execution plan. But at first, let’s see how Teradata strips down a SQL statement into a number of atomic operations which are repeating over and over again during query execution until finally a result set is delivered. The principle is easy: Take two tables (or spools, which are temporary tables used by Teradata during query execution) and join them together, creating a target spool. Proceed joining sets of two spools until the query is covered. It’s simple like that. Target spools are input to the next step in the execution plan until the execution is finished. Further, some steps are only related to data retrieval involving just one table. Again, this is a very simplified presentation but more than sufficient to be able to do performance optimization on query level. Now let’s proceed to the next step. Reading and understanding the plain text output delivered by the EXPLAIN statement. If you look at it for the first time, you may not recognize any structure at it may Again we will deconstruct the output into its pieces in order to carve out the important information behind. First of all, it is clear that the output of explain shows exactly the sequence of two table joins, mentioned above. This is our first level of breakdown. Prepared with this knowledge about blocks of two-table joins we can now dig deeper into one of this blocks and analyze what’s happening in detail. Operations: Each of this blocks represents a certain operation on Teradata: this are mainly retrieving of rows, joining the rows of the two tables together, doing an aggregation. Together with the type of operation, we get information about how many AMPs take part. This can for example be All-AMPs, Single-AMP, Group-AMP. You will see in the explain output for each block something like:

Upload: scott-mario

Post on 11-Mar-2016

214 views

Category:

Documents


1 download

DESCRIPTION

If you are new to Teradata, performance optimization of a single SQL statement probably seems to be quite challenging and you may not know where to start from...

TRANSCRIPT

Deconstructing the Teradata explain output

If you are new to Teradata, performance optimization of a single SQL statement probably seems

to be quite challenging and you may not know where to start from...

Happily, we have a very useful tool which gives us valuable information needed for performing

this task. The EXPLAIN statement.

Together with some basic knowledge about how SQL statements are executed on the system

you will see that in a short time you are able to understand each single query running on your system

and can track down performance issues in a short time (however, solving them is another challenge)

Although you may feel quite overextended when interacting for the first time with the explain

tool, all you need to do is to deconstruct the parts making up the execution plan.

But at first, let’s see how Teradata strips down a SQL statement into a number of atomic

operations which are repeating over and over again during query execution until finally a result set is

delivered.

The principle is easy:

Take two tables (or spools, which are temporary tables used by Teradata during query

execution) and join them together, creating a target spool. Proceed joining sets of two spools until the

query is covered. It’s simple like that. Target spools are input to the next step in the execution plan until

the execution is finished.

Further, some steps are only related to data retrieval involving just one table.

Again, this is a very simplified presentation but more than sufficient to be able to do

performance optimization on query level.

Now let’s proceed to the next step. Reading and understanding the plain text output delivered

by the EXPLAIN statement. If you look at it for the first time, you may not recognize any structure at it

may

Again we will deconstruct the output into its pieces in order to carve out the important

information behind.

First of all, it is clear that the output of explain shows exactly the sequence of two table joins,

mentioned above. This is our first level of breakdown. Prepared with this knowledge about blocks of

two-table joins we can now dig deeper into one of this blocks and analyze what’s happening in detail.

Operations:

Each of this blocks represents a certain operation on Teradata: this are mainly retrieving of

rows, joining the rows of the two tables together, doing an aggregation. Together with the type of

operation, we get information about how many AMPs take part. This can for example be All-AMPs,

Single-AMP, Group-AMP.

You will see in the explain output for each block something like:

All-AMPs retrieve step, Single-AMP retrieve step, All-AMP join step, Single-AMP join step, All-

AMPs step to aggregate.

Now, as we now which kind of operations exists, we can start to deconstruct each of them in

order to get further important details.

Each operation may be preceded by some preparation steps. This is mainly redistribution and

sorting of rows. Sorting is just a useful step to be able to apply more sophisticated search algorithms on

the data.

Join Preparation:

Redistribution is needed as join steps are done by the AMPs holding the rows to be joined. As

AMPs are independent processes with no access possibility to the disks of other AMPs, the rows to be

joined from two tables have to be handled by the same AMP. Easiest way to achieve this: rehash one or

both tables taking part in the join.

You will see something like the following in the explain output for row movement:

duplicated on all AMPs, redistributed by the hash code of (new hash column(s)), redistributed by

rowkey etc.

You will see something like the following in the explain output for sorting:

sort to order by hash code, sort to order by row hash, sort to partition by rowkey etc.

Row retrieval strategy:

One important information all operations carry is the way of retrieving rows from the disks. As

you know there are several access paths to get out the data from the Teradata Database like full table

scans, primary index access, secondary index access etc.

You will see something like the following in the explain output for row retrieval:

by way of all-row scan, by way of rowhash match scan, by way of the primary index, by the way

of hash value etc.

Join Type:

Finally, if the operation is a join operation, the explain output will tell you exactly which kind of

join strategy was chosen:

using a product join, using a single partition hash join, using a merge join, using a rowkey based

merge join etc.

Confidence:

Each block gives as well information about the level of confidence about the number of resulting

rows it has. High confidence would be the best but is not always possible.

In the explain output you will see something like:

high confidence, low confidence, no confidence and join index confidence

This was a description of the basic build blocks after deconstructing the output of an explain

statement. It is by far not complete but covers probably 80% of the important information available in

any explain output.