ppi

Partitioned Primary Indexes

Jerry Klindt

(updated by Paul Sinclair)

October 20, 2004

Data Warehousing > Database

Introduction

Some common business queries generally

require a full-table scan of a large table

even though its predictable that a fairly

small percentage of the rows will qualify.

One example of such a query is a trend

analysis application that compares current

month sales to the previous month, or to

the same month of the previous year,

using a table with several years of sales

detail. Another example is an application

that compares customer behavior in one

geographic region to another region.

Prior to Teradata Database V2R5, there

were few viable opportunities for a

Database Administrator (DBA) to struc-

ture the data warehouse in a manner that

allowed such queries to avoid full-table

scans. Starting with Teradata Database

V2R5, the DBA has a flexible and powerful

tool to structure tables to allow automatic

optimization of frequently used queries

of this class. That tool is the partitioned

primary index (PPI). A PPI allows a table

to be partitioned on columns of interest

while retaining the traditional use of the

primary index (PI) for data distribution

and efficient access when the PI values are

specified in the query.

A carefully-chosen partitioning expression

can result in partial-table scans instead

of full-table scans with dramatic improve-

ments in resource consumption and

elapsed time (elapsed time decreases of

99% or more are possible). Batch insert

and update times may also be improved

when the partitioning column is chosen

to match the arrival pattern of the data

(elapsed time decreases of 90% or more

are possible).

EB-1889 > 1204 > PAGE 2 OF 14


Executive Summary 2

Introduction 2

Definitions and Basics 3

How Much Can PPI Improve 3Performance?

How PPI Solves the Business 4Problem Example One

Can the First Example Be Improved 8Further?

A Second Example 9

A Final Example 10

Specifics of Defining a PPI Table 11

High-Level Partitioning Guidelines 13

High-Level Trade-off Considerations 13

Summary 14

Table of Contents

Executive Summary

Partitioned primary indexes, introduced in Teradata

Database V2R5, provide an opportunity to greatly

improve performance of certain queries, and to improve

the performance of high-volume insert, update, and

delete operations. The feature is flexible, yet easy to use,

and is largely transparent to end users.

The process for physically defining the

partitioning expression, via the CREATE

TABLE statement, is simple and straight-

forward. This paper gives some examples.

As is true for all physical database design

decisions, there are trade-off considera-

tions associated with each possible choice.

Its beyond the scope of this paper to discuss

the trade-off considerations at length.

The objective of this paper is to provide

realistic examples and actual performance

comparisons using PPI and non-PPI

solutions.

Definitions and Basics

In the context of PPI, partitioning refers

to the physical ordering of rows within

the table. The ordering is automatically

provided by the database management

software, and is determined by a user-

specified expression called the partitioning

expression. A PPI table physically is

substantially the same as a non-PPI table

except for the ordering of rows. More

specifically, the PI value is hashed to

distribute a row to a particular AMP in

an identical fashion for PPI and non-PPI

tables. Within each AMP, rows are ordered

by PI hash for non-PPI tables, and by

partition number first then PI hash for

PPI tables.

The partitioning expression is specified

on the CREATE TABLE statement in a

PARTITION BY clause following the

PRIMARY INDEX definition. The result

of the expression must be an integer value

or a value that can be cast to integer, and

the result indicates the partition number.

The columns referenced in the partition-

ing expression are called the partitioning

columns. A partition number must be

between 1 and 65,535, inclusive; therefore,

the maximum number of partitions that

can be defined for a table is 65,535.

Accessing a particular partition of a table

means accessing a subset of the table

beginning with the data block containing

the first row belonging to the partition

(on each AMP), and extending to the data

block containing the last row belonging

to the partition. The number of data

blocks will be zero if there are no rows

belonging to that partition (although it

may be necessary to read one data block

to determine that there are no rows for

the partition).

The term partition elimination refers to

an automatic optimization in which the

Optimizer determines, based on query

conditions and the partitioning expression,

that some partitions cannot contain

qualifying rows, and causes those partitions

to be skipped. Partitions that are skipped

for a particular query are called eliminated

partitions. Generally, the greatest benefit

of a PPI table is obtained from partition

elimination.

The term direct merge join is used to

describe a join in which the table of interest

is not spooled in preparation for a merge

join. The Optimizer may choose a direct

merge join when all columns of the PI are

specified in equality join terms.

The term direct product join is used to

describe a join in which the table of interest

is not spooled in preparation for a product

join. The Optimizer may choose a direct

product join when all the partitioning

columns are specified in equality join terms.

The ordering of rows within a table is

transparent to application developers, but

there are trade-off considerations involving

queries with partitioning column condi-

tions, queries that specify one or a few PI

values and queries that perform joins on

the PI columns. We will briefly discuss

these trade-off considerations in subse-

quent sections.

How Much Can PPIImprove Performance?

The performance gain depends on the

number of partitions and the specific

query being measured. In the best case,

the elapsed time reduction factor for a

specific query against a single table can

approach the reciprocal of the number

of partitions in the table. This means that

best-case PPI queries can take less than

1/100 of one percent of the time they

would take with a non-PPI table. The best

performance improvement occurs when


EB-1889 > 1204 > PAGE 3 OF 14

there are many partitions with reasonably

even distribution of rows among the

partitions, and partition elimination

excludes all except one partition.

Figure 1 shows the results of actual

performance tests. The Baseline column

is the performance for a non-PPI table,

and the PPI column is the performance

for a PPI counterpart table. These tests are

considered to be realistic, but your results

may vary.

How PPI Solves theBusiness Problem theFirst Example

We start the discussion of when a PPI is

most appropriate by showing the differ-

ences between a PPI and non-PPI table

for a few examples. For the first example,

we stipulate a table and some processing

requirements, discuss the options available

prior to Teradata Database V2R5, and

discuss the optimization opportunities a

PPI provides.

Our hypothetical company has a large

sales table containing the details of each

transaction for the previous 24 full

months plus the current month-to-date.

Once per month, the transactions from

the oldest month are deleted. Current

transactions are added to the table nightly

using Teradata MultiLoad. Most transac-

tions are added on the date they occur,

but a small percentage of transactions may

be reported a few days after they occur.

The number of transactions per month is

roughly the same for all months.

Each row contains, among other things,

the product code for the item, the transac-

tion date, an identifier for the sales agent,

and the quantity sold. The rows are short,


EB-1889 > 1204 > PAGE 4 OF 14

Figure 1. Actual Performance Test Results

Test Description Baseline PPI Improvement

Select rows that have a specied value of 59 seconds one second 98% reduction in the partitioning column (200 partitions elapsed time with roughly the same number of rows each)

Select a month of activity from one partition 58 seconds two seconds 96% reduction in containing six months of data (11 years of elapsed timedata contained in 40 partitions of unequal size)

Delete rows that have a specied value of the 239 seconds one second more than 99% reduction partitioning column (200 partitions of equal size) in elapsed time

Update one column in each row that has a 237 seconds three seconds 98% reduction in specied value of the partitioning column elapsed time (200 partitions of equal size)

MultiLoad insert a number of rows equal to 1% 1394 rows per 14,742 rows more than ten times of the table size into one partition (of 200) second per node per second faster

(larger numbers per nodeare better

MultiLoad insert a number of rows equal to 1% 841 rows per 5666 rows per more than six times of the table size into one partition (of 200) with second per second per faster one NUSI dened on the table node node

and the data blocks are large. The PI is a

composite of product code, transaction

date, and the agent identification. The

non-PPI definition of this table, showing

only a few of the most important columns,

is as follows:

CREATE TABLE SalesTable (

product_code CHAR(8),

sales_date DATE,

agent_id CHAR(8),

quantity_sold INTEGER,

other_columns CHAR(50))

PRIMARY INDEX (product_code,sales_date, agent_id);

There are four major categories of queries

against this table:

> A modest number of short-running

queries specify the PI values.

> Many ad hoc queries have the follow-

ing general pattern:

Compare one month of activity to

another month, or

Compare current-month-to-date

sales to the same days of the

previous month or to the same days

of the same month of the previous

year for a few product code values.

> Some queries analyze agent perform-

ance, usually over an interval of a

calendar quarter or less.

> Some queries examine sales trends over

the previous 24 full months, usually for

most or all product code values.

No other tables have the same PI

definition. The sales table is frequently

joined to relatively small tables containing

information about each product code and

each sales agent.

The DBA, prior to Teradata Database

V2R5, had a need to speed up ad hoc

queries and agent analysis queries. The

DBA considered creating a value-ordered

secondary index or join index on the

transaction date column, and had set up

tests for those scenarios. After running

and analyzing EXPLAINs, the DBA had

found that the Optimizer had determined

that neither index was selective enough to

be an improvement over a full-table scan.

The DBA then considered splitting the

table into 25 separate tables, each contain-

ing transactions for a calendar month.

Then, the DBA would create a view with

a UNION of all the tables for use by the

applications that analyze 24 months of

sales history. The DBA concluded that

this solution could indeed speed up the

targeted queries, but it added too much

complexity for the end users. Users would

have to understand the structure and

change the table names in their queries,

code more complicated UNION state-

ments, and select appropriate date ranges

and product code ranges. The need to

know the appropriate table name (from

the 25 different tables) would also apply

to applications submitting short-running

queries that specify the primary index.

This solution would also complicate

nightly load jobs, especially in the first few

days of a month when a few of the trans-

actions would be from the prior month.

The solution would also complicate the

archive strategy. In the end, this solution

was rejected as being too complicated and

error-prone.

With PPI, theres an excellent solution

for this example scenario. By adding a

PARTITION BY clause to the definition

of the replacement PPI table, it would be

easy to create 25 partitions, one for each

month (assuming the current date is in

October 2004).

CREATE TABLE PPI_SalesTable (

product_code CHAR(8),

sales_date DATE,

agent_id CHAR(8),

quantity_sold INTEGER,


PRIMARY INDEX (product_code,sales_ date, agent_id)

PARTITION BY RANGE_N (sales_date BETWEEN

DATE 2002-10-01 AND DATE 2004-10-31 EACH INTERVAL 1 MONTH);

The RANGE_N function was used in this

scenario to specify the beginning and

ending dates and the granularity of the

partitioning.

By converting the sales table into a table

partitioned by transaction month, many

of the queries would run faster (in this

scenario) with no significant negative

trade-off considerations. Lets examine

each element of the stated workload as it

applies to the newly-partitioned table in

more detail.


EB-1889 > 1204 > PAGE 5 OF 14

Faster Monthly Deletes

Instead of using Teradata MultiLoad to

delete rows, the DBA could submit an

ALTER TABLE statement on a monthly

basis (see the next example) to drop the

oldest partition and delete its rows, and at

the same time create a new partition that

would contain data for the upcoming

month. Additional partitions for future

months could be added if desired. A delete

of all the rows in a partition is optimized

in much the same way that a delete of all

rows in a table has historically been

optimized. In both cases, there is no need

to record the individual rows in the

transient journal as theyre deleted. The

rows for the month being deleted are

physically stored contiguously (on each

AMP) instead of being scattered more or

less evenly among all the data blocks, as in

the non-PPI table, so there would be fewer

data blocks with rows to be deleted. Most

of the deletes would be full-block deletes

so the data block would not have to be

read or rewritten. Only one data block per

AMP would contain rows for the oldest

month plus the second oldest month, and

that would be the only data block read,

updated, and rewritten. There is also no

need to touch any of the rows for the

other month partitions. Dropping the

oldest partition(s) with an ALTER TABLE

statement is a nearly instantaneous

operation assuming there are no second-

ary indexes or join indexes that require

updates, there are no retained or added

partitions (such as NO RANGE) to move

the rows, and the option to make a copy of

the deleted rows is not specified. For

example, to drop the partition and delete

the rows for October 2002, and create a

partition for November 2004,

you would submit:

ALTER TABLE SalesTable MODIFYPRIMARY INDEX (product_code,sales_date, agent_id)

DROP RANGE BETWEEN

DATE 2002-10-01 AND DATE 2002-10-31

ADD RANGE BETWEENDATE 2004-11-01 AND DATE 2004-11-30

WITH DELETE;

Faster Teradata MultiLoad

Inserts

The nightly Teradata MultiLoad insert

job would run faster than it did for the

non-PPI table. Instead of the inserted rows

distributing more or less evenly among

all the data blocks of the table, as with the

non-PPI table, the inserted rows would be

concentrated in data blocks for the proper

month. This would increase the average

"hits per block" count (a key measure of

Teradata MultiLoad efficiency) and reduce

the number of data blocks that must be

read and rewritten.

Virtually No Change to Short-

Running Queries

Short-running queries that specify primary

index values would run approximately

as fast as on the non-PPI table. Since the

partitioning column is part of the primary

index, the PI access performance would not

be significantly changed.

Signicant Performance Gains

in Ad hoc Queries

Large gains would be seen in ad hoc queries

that, for example, compare a recent month

of sales data to a prior month. Due to

partition elimination, only two of the

25 partitions would be read instead of the

full-table scan required on the non-PPI

table. This means that the number of disk

reads would be reduced by roughly 92%

with a proportional reduction in elapsed

time. The 92% figure applies to the step

that reads the sales table, not to the sum

of all the steps used to accomplish the

query. Given the stated assumptions, the

other steps should take roughly the same

amount of time as for the non-PPI table.

The same considerations apply to the

agent analysis queries. The number of

partitions read is determined by the time

period specified in the query. Even if the

analysis is for twelve full months, there is

still roughly a 50% gain in reading twelve

of 25 partitions for the step that reads the

sales table.

No Degradation to Queries

Requiring a Full-Table Scan

Decision support queries that analyze 24

months of sales data would take roughly

the same time and resources as for the non-

PPI table. There would be a small gain from

reading 24 instead of 25 partitions. If the

analysis is for 24 months plus the current

month (i.e., the entire table), the resource

usage is the same as for the non-PPI table.


EB-1889 > 1204 > PAGE 6 OF 14

Virtually No Degradation for

Joins

Joins would take roughly the same amount

of time. In this example, since there are no

other tables with the same primary index,

there are no direct merge joins to the sales

table. Joins to the product table and agent

table would most likely use the same join

strategy as when the sales table was not

partitioned. The join strategy would

typically be either a duplication of a small

table followed by a product join to the

sales table, or a redistribution of a spool

file followed by a merge join. Neither

strategy is less efficient with the partition-

ing of the sales table. Joins could even be

faster depending on the specific query

conditions and the possibility of partition

elimination.

More Efcient Archiving and

Restoring

In Teradata Database V2R6, partitions can

also be selectively archived, restored, and

copied. This can significantly reduce the

time to archive data by only archiving the

recently changed partitions. Restores of

selected partitions can be used to quickly

reload critical partitions.

Additional Disk Space

Required

The partitioned sales table would require

somewhat more disk space than the non-

partitioned counterpart due to the two-

byte partition number recorded in each

row. For this example, the percentage of

increase would be less than 3%.

Figure 2 summarizes the improvement

opportunities for the example.


EB-1889 > 1204 > PAGE 7 OF 14

Figure 2. Example of PPI Improvement Opportunities

Activity Non-PPI Table PPI Table Improvement Comments

Nightly inserts Inserted rows Inserted rows Faster performance No changes to load scattered throughout concentrated in one script needed.table partition

Monthly delete of MultiLoad job reads ALTER TABLE Much faster Easier maintenanceone month of data most data blocks, statement deletes performance

updates most data partitionblocks

Primary index access One data block read One data block read No change No SQL changes needed

Comparison of All data blocks read Two partitions read Step is 12 times No SQL changes neededcurrent month to faster (two partitions prior month of 25 read)

Trend analysis over All data blocks read All data blocks read Little change Rows are two bytes longerentire table for PPI. 2% more data

blocks for 100-byte rows.

Joins No direct merge joins No direct merge Little change No direct merge joins due joins to choice of primary index.

Archive/Restore Entire table Entire table or Faster archives for Saves having to re-archive(in Teradata selected partitions selected partitions data already archivedDatabase V2R6)

Can the First Example BeImproved Further?

The first PPI solution, outlined above,

was to partition by month since many of

the queries use a month as their basic unit

of time. Another option to consider is

partitioning to a finer level. Let's compare

partitioning by month to partitioning by

day using the following PARTITION BY

clause:

PARTITION BY RANGE_N (sales_dateBETWEEN

DATE 2002-10-01 AND DATE 2004-10-31 EACH INTERVAL 1 DAY);

The table would now have about 760

partitions (two years with 365 days each

plus the current month of about 30 days).

Some small number of partitions, the

ones corresponding to future dates in the

current month, would be empty.

Virtually No Impact to the

Monthly Deletes

The monthly process deleting the oldest

month of data would virtually be the

same. Depending on the month, between

28 and 31 smaller partitions would be

deleted instead of one larger partition.

However, the same number of rows would

be deleted, and the run time for the job

would be roughly the same.

Faster Nightly Inserts

Nightly inserts would benefit from the

finer partitioning. Instead of being con-

centrated in one or two partitions out

of the 25 large partitions, as in the last

example, the rows would be inserted into

three to five smaller partitions of the 760

daily partitions, well under one percent

of the total. Most of the inserts would be

directed to the one partition that contains

the day's activity. This would increase the

hits per block, thereby improving the

performance of the inserts.

No Impact to Short-Running

Queries

Having 760 partitions instead of 25 would

not impact short-running PI access queries.

This is because in this example the parti-

tioning column is part of the primary

index. In other situations, there could be

a significant impact.

Modest Improvement for

Some Ad hoc Queries

Ad hoc queries that analyze two full

months of data would not be impacted.

They would now access about 60 parti-

tions out of the 760, instead of two out

of 25, roughly the same percentage of the

table. However, when queries vary by the

time of month, there would be some gain

by having the larger number of partitions.

For example, a query submitted on the

fifth day of the current month might

analyze four days for each of two months,

while a query submitted on the last day of

the current month might analyze about 30

days for each of two months. Instead of

two out of 25 monthly partitions (between

32 and 36 days of data), the query on the

fifth day of the current month would

involve eight out of 760 partitions (eight

days of data), which is a smaller percent-

age of the table. The query at the end of

the month would examine about 60 out of

760 partitions, which is substantially the

same as two out of 25 monthly partitions.

Analysis queries that examine 24 months

of data would run in about the same time

as they are examining most of the table in

either case.

The number of partitions would not

significantly impact the joins since there

are no direct merge joins against this table

in this scenario.

In summary, for this example, having a

larger number of smaller partitions would

produce modest gains and no degradation

to performance. The greatest gains would

be for queries that analyze only a few days

of transactions, and for the nightly loads.

Additionally in Teradata Database V2R6, a

days transactions (that is, a small partition

of data) could be selectively archived or

restored.


EB-1889 > 1204 > PAGE 8 OF 14

A Second Example

While transaction date is frequently a good

choice for the partitioning column, it is not

the only choice. Let's consider a telephone

company's table with detailed information

about phone calls. There is a row for each

outgoing call with the originating phone

number, the timestamp for the start of the

call, and the call duration, among other

things. The rows are retained for a variable

length of time based on the call date and

the monthly bill preparation date. This is

not the same for every customer, and the

retention period is rarely more than six

weeks. The primary index is the phone

number and the call-start timestamp. This

implies the primary index was chosen to

provide good data distribution across the

AMPs. It is also obvious that the primary

index was not chosen for data access or to

facilitate direct merge joins. Some queries

analyze all calls from a particular phone

number. Other queries analyze all calls for

a particular period of time, perhaps for as

long as a month, for customers meeting

certain criteria. A non-PPI definition of

this table, showing only a few critical

columns, follows:

CREATE TABLE CallDetail (

phone_number DECIMAL(10) NOT NULL,

call_start TIMESTAMP,

call_duration INTEGER,


PRIMARY INDEX (phone_number,call_start);

One possibility for partitioning this table

would be to cast call_start as a date and

partition by date, similarly to the solution

in the first example. This would help with

inserting new activity in the same manner

as in the previous example. Deletion of

rows would not get the same performance

gain since the deletes are not strictly by

call date and, therefore, the deleted rows

would not be clustered in a partition. In

this case, the ALTER TABLE statement

could not be used, and the process would

not reap the same performance benefit

that deleting entire partitions provides.

The analysis queries that are based on the

date of the call would benefit with queries

specifying a range of a few days getting the

greatest gain.

Another choice would be to use the phone

number as the partitioning column. Phone

numbers contain too many digits to give

each number its own partition, but a

subset of the digits could be used. If the

first (high-order) three digits are used,

there would be 1000 partitions, some of

which would always be empty because of

the way phone numbers are assigned.

This partitioning expression would not

improve the performance of bulk inserts

or deletes, which would be scattered across

all partitions. It would not help with date-

based queries, but would allow queries

specifying a phone number to run much

faster as only one partition would be read

out of maybe 500 or more non-empty

partitions. A second advantage would be

to benefit geographic area analysis since

(in some parts of the world, at least) the

first three digits identify a particular area.

If 1000 partitions improve performance,

10,000 partitions (the first four digits of

the phone number) would probably be

even better. If 10,000 partitions were good,

maybe 50,000 would be better yet. We

cannot have 100,000 partitions, but we

could use the first five digits and assign

two consecutive numbers to each parti-

tion. Some partitions might be empty due

to the way phone numbers are assigned.

For example, this table definition creates

50,000 partitions using the first five digits

of the phone number:

CREATE TABLE PPI_CallDetail (





PRIMARY INDEX (phone_number,call_start)

PARTITION BY RANGE_N (

CAST(phone_number / 100000.00000AS INTEGER) BETWEEN 0 AND 99999EACH 2);

If its not important to be able to map a

geographic area to one or more partitions,

another option would be to maximize the

number of partitions by using the parti-

tioning expression (phone_number mod

65535) + 1. If the table contains about

3.276 billion rows, on average each


EB-1889 > 1204 > PAGE 9 OF 14

partition would contain about 50,000

rows. For a system with 100 AMPs, each

AMP would on average contain about

500 rows per partition, a number of rows

that might fit in one data block if the row

width was fairly small. The decrease in

response time of a one-partition scan for

all activity for a particular phone number

would be dramatic compared to the full-

table scan that would result with the

non-PPI table. A query to return activity

for one phone number is a best-case

scenario for single-table response time

improvement due to PPI. Disregarding the

overhead cost of initiating the query and

returning the answer set, the elapsed time

could be reduced to 1/65535 of the time

using the non-PPI table. Including the

query initiation and termination overhead,

the total query time improvement would

be somewhat less than a factor of 65,535,

but could be less than 1/10000 of the

non-PPI time. Here is a table definition to

use this partitioning:

CREATE TABLE PPI_CallDetail (





PRIMARY INDEX (phone_number,call_start)

PARTITION BY phone_number MOD 65535 + 1;

The best choice, if any, of these proposed

partitioning expressions depends on the

mix of anticipated queries. The extended

logical data model can serve as the starting

point for making the decision, but some

amount of testing of different scenarios

will often be required.

A Final Example

The previous examples illustrate scenarios

where a PPI table is the correct choice. For

this example, we examine a more ambigu-

ous situation in which more trade-off

considerations apply, and the correct

solution is not as evident.

An invoice table contains data about

each invoice issued in the past four years.

The unique primary index is invoice

number. New rows are added nightly using

Teradata MultiLoad, and the oldest month

of data is deleted once per month. There

is a moderately heavy volume of queries

that get information about one specified

invoice. There are ad hoc analysis queries

that examine all invoices for some period

of time, usually less than a year. Other

tables have invoice number as their

primary index, but do not have an invoice

date column. There are frequent joins with

those other tables.

The DBA is considering whether it would

be advantageous to partition the invoice

table on invoice date using one-month

ranges.

The following are some of the considera-

tions that will apply:

Additional Disk Space

Required

The primary index is currently defined as

unique, but would have to be defined as

non-unique if the table was partitioned.

There is a business requirement to guaran-

tee that invoice numbers are unique.

Therefore, the DBA would have to define

a unique secondary index on the invoice

number column. This secondary index

would increase processing times on insert,

delete, and update operations, and con-

sume additional disk space. The base table

would also be larger, by two bytes per row,

further increasing the required disk space.

Slower Short-Running Queries

PI access queries would now use the

unique secondary index to access the

row. As a rule of thumb, accessing the

row using a secondary index would take

roughly two to three times as long as using

the primary index for the non-PPI table.

On a positive note, the PI access is a very

fast, usually a sub-second, operation.

Doubling or tripling the response time is

likely to go unnoticed to the users who

issue those queries.

Slower Long-Running Queries

Direct merge joins (without partition

elimination) would at best require more

memory and CPU time, and may be

measurably slower compared to a similar


EB-1889 > 1204 > PAGE 10 OF 14


EB-1889 > 1204 > PAGE 11 OF 14

non-PPI table. The amount of perform-

ance degradation will depend on the query

conditions, how many partitions can be

eliminated, and the specific join plan

chosen by the Optimizer. Actual measure-

ment of representative queries will be

required to determine the overall differ-

ence in performance.

Impact on Table Maintenance

Nightly inserts would benefit in the same

way as in the first example for the same

reasons. However, the additional index on

invoice number would partially offset the

benefit. Since Teradata MultiLoad does not

support unique secondary indexes, the

index would need to be dropped prior to

the MultiLoad job and then recreated after

the job. Alternatively, this may be an

opportunity to move to a near-real-time

load strategy using, for example, Teradata

TPump.

The same considerations as in the first

example apply to the monthly deletes.

Similarly, in Teradata Database V2R6,

benefits may occur with archives and

restores of selected partitions.

Faster Ad hoc Queries

Ad hoc queries examining several months

of invoices would benefit in the same

way as in the first example. The benefit

would be greatest when fewer months are

examined.

Would it be worthwhile to convert the

invoice table to use a PPI? The DBA will

need to measure the amount of improve-

ment and degradation in the various types

of queries, and determine how much each

query type contributes to the overall

workload involving this table. This will

provide an estimate of the overall work-

load performance with and without a PPI

table. If the difference between a PPI and

non-PPI table performance is substantial

in either direction, the choice will be

evident for the overall workload. But the

DBA should also consider the relative

importance of the various activities. For

example, if the nightly insert volume is

starting to overwhelm the time set aside

for inserting new activity, even a small

improvement in load time might be

considered sufficiently important to offset

larger degradations in queries. Similarly, if

the response time of PI queries is critical,

even a small degradation in those queries

might be considered unacceptable even if

overall workload performance is improved.

In short, measurement and analysis is

required to come to a rational decision

for this case.

Specifics of Defining a PPI Table

The PRIMARY INDEX clause of the

CREATE TABLE statement may be

followed by an optional PARTITION BY

partitioning_expression clause. The parti-

tioning expression is a general expression

allowing wide flexibility in tailoring the

partitioning expression to the unique

characteristics of the table. Two functions,

RANGE_N and CASE_N, are provided

to simplify the creation of partitioning

expressions.

One or more columns can make up the

partitioning expression although its

anticipated that, for most tables, one

column will be specified. The partitioning

columns can be part of the primary index,

but are not required to be. The result of

the partitioning expression must be a

scalar value that is INTEGER or can be

cast to INTEGER. Most deterministic

functions can be used within the expres-

sion. The expression must not require

character or graphic comparisons,

although character or graphic columns

can be referenced in some circumstances.

If the partitioning columns are not all part

of the primary index, the primary index

cannot be defined as unique although a

unique secondary index can be defined on

the same columns as the primary index.

Only base tables can be PPI tables. This

excludes global temporary tables, volatile

tables, join indexes, hash indexes, and

secondary indexes. This restriction does

not mean that a PPI table cannot have

secondary indexes or cannot be referenced

in the definition of a join index or hash

index. It merely means that the PARTI-

TION BY clause is not available on a

CREATE GLOBAL TEMPORARY TABLE,

CREATE VOLATILE TABLE, CREATE

INDEX, CREATE JOIN INDEX, or

CREATE HASH INDEX statement.

In the general case, there can be up to

65,535 partitions numbered from one.

As rows are inserted into the table, the

partitioning expression is evaluated to

determine the proper partition placement

for that row. A two-byte internal represen-

tation of the partition number is

embedded in the row as part of the row

identifier making PPI rows two bytes

wider than they would be if the table

wasnt partitioned. Secondary indexes

referencing PPI tables use the wider row

identifier, making those rows wider as

well.1 Except for the embedded internal

partition number, PPI rows have the same

format as non-PPI rows. A data block can

contain rows from multiple consecutive

partitions. There are no new control

structures to implement the partitioning

expression.

Sample uses of partitioning expressions

were shown in the discussions of the

examples that were presented earlier. While

the examples were simple, the partitioning

expression is a general expression, which

makes it possible to define complex

partitioning schemes tailored to the

processing needs of individual tables.

However, a simple partitioning expression

(for instance, RANGE_N on a single date

column) may provide the best opportuni-

ties for partition elimination in queries.

The Optimizer does partition elimination

for a query by analyzing the constraints on

the partitioning columns in the context of

the partitioning expression. Constraints

that compare the partitioning columns to

be equal to constant expressions provide

partition elimination. Also, range con-

straints on the partitioning column where

the partitioning column is compared to

constant expressions and the partitioning

expression is a single column or a

RANGE_N function on a single column

can provide partition elimination. In some

cases, the constant expressions may

contain USING variables and still provide

partition elimination.

Joins on the primary index columns of a

partitioned table that are equated to the

columns of another table are also opti-

mized when there are a small number of

non-eliminated partitions. In this case,

a set of partitions can be directly read in

a sliding window of merge joins and,

thereby, avoid spooling the partitioned

table prior to the join. If also joined by

equality on the partitioning columns, a

rowkey merge join simplifies and improves

the performance of the merge join.

In Teradata Database V2R5.1, dynamic

partition elimination can occur when

there is an equality constraint between

the partitioning column of one table and

a column of another table. This is useful

when looking up a row in one table and

matching those rows to corresponding

partitions (using a product join) instead of

a product join to the entire table. Teradata

Database V2R6 further extends dynamic

partition elimination to merge joins.

Another enhancement in Teradata Data-

base V2R6 provides partition elimination

on the referencing rowids of a secondary

index. Instead of looking up all the rows

in the base table for particular index value,

only rows in the base table referenced by

rowids pointing to non-eliminated

partitions are read.

Teradata Database V2R6 also makes a

Non-Unique Secondary Index (NUSI)

access a single-AMP operation if the

NUSI is on the same columns as the

Non-Unique Primary Index (NUPI) with

an equality condition on the NUSI. Note

that a NUSI on the same columns as the

NUPI is only allowed for a PPI table. This

potentially provides a faster access path

than using the NUPI but with the same

single-AMP and rowhash locking charac-

teristics. This can occur when the number

of occurrences of a NUSI value is less than

the number of partitions.

As mentioned earlier, Teradata Database

V2R6 provides for archives and restores of

selected partitions.

The ALTER TABLE statement has been

extended to support PPI. An example was

shown in the section How PPI Solves the

Business Problem the First Example to

drop the partition containing the oldest

transactions and create expansion parti-

tions for future dates. This is a simple

example, but it does illustrate the capability.

The ability to ALTER a PPI table provides

a simple and convenient mechanism for

the DBA to perform periodic maintenance

on a range-based PPI table.


EB-1889 > 1204 > PAGE 12 OF 14

1 A join index or hash index that references a table using a row identier uses the wider format whetheror not the table has a partitioned primary index starting with Teradata Database V2R5.

High-Level PartitioningGuidelines

Here are some general guidelines, with

limited discussion, to help determine

whether and how to partition a table:

1. Large tables are good candidates for

partitioning.

2. Partition on a column that is fre-

quently used as a restrictive query

condition.

3. If other factors are equal, partition on

a column that is part of the primary

index in preference to a column that

is not, unless the primary index is

seldom, if ever, used for access or joins.

4. While there are few restrictions on the

partitioning expression, a partitioning

expression is only useful if the Opti-

mizer can effectively apply partition

elimination to queries. A simple

partitioning expression is more likely

to give the maximum amount of

partition elimination than a more

complex expression. For example, a

RANGE_N function on a date column

can often be an effective partitioning

expression for queries with range

constraints on the partitioning column.

5. Use RANGE_N or CASE_N in prefer-

ence to direct use of a column in

most situations. The Optimizer can

determine the maximum number

of partitions when RANGE_N or

CASE_N is used, and will have to

assume 65,535 partitions otherwise.

Join costing, in particular, can be more

accurate when the actual number of

partitions is known and fairly small

than when the number is assumed to

be 65,535.

6. Unless the PI is rarely used for access

or direct merge joins, keep the number

of partitions fairly small when the

partitioning expression uses columns

that are not part of the PI.

7. The same considerations regarding

the selection of the primary index

apply to PPI tables as non-PPI tables.

Choose PI columns that provide good

distribution and avoid large clumps of

duplicate PI values, and which are most

commonly used to access individual

rows in the table. Sometimes those two

considerations conflict, and a reason-

able compromise between the two

must be reached.

A more detailed description of partition-

ing guidelines may be found in the

Teradata Orange Book: Partitioned Primary

Index Usage.

High-Level Trade-offConsiderations

The greatest potential gain derived from

partitioning a table is the ability to read

a small subset of the table instead of the

entire table. For example, a query that

examines two months of sales from a table

with two years of sales history would read

about one-twelfth of the table instead of

all of it. This can provide a large perform-

ance boost for a wide range of queries, day

after day, and is automatic. SQL authors

need not be aware of the partitioning

structure, and no changes are required to

existing SQL.

A second potential advantage is faster

batch loads. If the table is partitioned

by transaction date, nightly loads of

transactions for the current day can be

dramatically improved. Similarly, the time

to delete old rows no longer needed can be

dramatically faster (nearly instantaneous

in some cases) when the table is parti-

tioned by transaction date.

Finally with Teradata Database V2R6,

you can perform archives and restores of

selected partitions. This allows for more

frequent, but less costly archives. For

restores, critical data (for example, in the

most recent partitions) can be restored

quickly and made available to users

without waiting for the entire table to

be restored.

In the above situations, the improvement

may be even greater when the partitioning

structure makes one or more secondary

indexes or join indexes redundant, allow-

ing those indexes to be dropped.

Offsetting these gains are some potential

disadvantages of partitioning. The first

disadvantage is that PI access of the table

may be slower when a partitioning column

is not part of the PI. This disadvantage can


EB-1889 > 1204 > PAGE 13 OF 14

be offset by choosing partitioning columns

that are part of the PI, specifying the

values of the partitioning columns and

the PI columns, or, in some situations, by

defining a secondary index.

A second disadvantage is that direct merge

joins involving a partitioned table may be

slower unless both tables can be identically

partitioned. The disadvantage can be offset

when the query conditions allow some

partitions to be eliminated from the join.

As in all physical design choices, you must

weigh the trade-off considerations and test

assumptions to get the best results.

A more detailed description of trade-off

considerations may be found in the

Teradata Orange Book: Partitioned Primary

Index Usage.

Summary

PPI tables can dramatically improve

performance of certain types of queries,

especially those that access only a small

part of a large table. High-volume data

load and data maintenance times can

also be improved when, for example,

the transaction date is specified as the

partitioning column.

A partitioned primary index is flexible and

easy to use. PPI tables retain the traditional

uses of primary indexes to distribute data

evenly and provide very fast access when

the primary index value is specified in

the query.

No changes to existing SQL are necessary.

Users accessing a PPI table will see no

difference, except perhaps for different

average response times.

Whether and how to partition the primary

index of a table is a physical design choice.

The trade-off considerations associated

with a PPI should be understood and

considered when making the physical

design decisions.

The extended logical data model can serve

as the starting point for making physical

design decisions, but some amount of

testing of different scenarios will often be

required. As with other physical design

decisions, the total workload and relative

importance of the workload components

must be examined to determine whether

the benefits will outweigh the disadvan-

tages for each design decision.


EB-1889 > 1204 > PAGE 14 OF 14

Teradata and NCR are registered trademarks of NCR Corporation. NCR continually enhances products as new technologies and components become available. NCR, therefore, reserves the right to change specications without prior notice. All features, functions, and operations described herein may not be marketed in all parts of the world. Consult your Teradata representative or visit Teradata.com for more information. No part of this publication may be reprinted or otherwisereproduced without permission from Teradata.

This document, which includes the information contained herein, is the exclusive property of NCR Corporation. Any person is hereby authorized to view, copy, print,and distribute this document subject to the following conditions. This document may be used for non-commercial, informational purposes only and is provided onan AS-IS basis. Any copy of this document or portion thereof must include this copyright notice and all other restrictive legends appearing in this document. Note that any product, process or technology described in the document may be the subject of other intellectual property rights reserved by NCR and are notlicensed hereunder. No license rights will be implied. Use, duplication or disclosure by the United States government is subject to the restrictions set forth in DFARS252.227-7013 (c) (1) (ii) and FAR 52.227-19.

2004 NCR Corporation Dayton, OH U.S.A. Produced in U.S.A. All Rights Reserved.

Teradata.com

ppi

Documents