ppi
DESCRIPTION
Teradata - Partitioned Primary IndexesTRANSCRIPT
-
Partitioned Primary Indexes
Jerry Klindt
(updated by Paul Sinclair)
October 20, 2004
Data Warehousing > Database
-
Introduction
Some common business queries generally
require a full-table scan of a large table
even though its predictable that a fairly
small percentage of the rows will qualify.
One example of such a query is a trend
analysis application that compares current
month sales to the previous month, or to
the same month of the previous year,
using a table with several years of sales
detail. Another example is an application
that compares customer behavior in one
geographic region to another region.
Prior to Teradata Database V2R5, there
were few viable opportunities for a
Database Administrator (DBA) to struc-
ture the data warehouse in a manner that
allowed such queries to avoid full-table
scans. Starting with Teradata Database
V2R5, the DBA has a flexible and powerful
tool to structure tables to allow automatic
optimization of frequently used queries
of this class. That tool is the partitioned
primary index (PPI). A PPI allows a table
to be partitioned on columns of interest
while retaining the traditional use of the
primary index (PI) for data distribution
and efficient access when the PI values are
specified in the query.
A carefully-chosen partitioning expression
can result in partial-table scans instead
of full-table scans with dramatic improve-
ments in resource consumption and
elapsed time (elapsed time decreases of
99% or more are possible). Batch insert
and update times may also be improved
when the partitioning column is chosen
to match the arrival pattern of the data
(elapsed time decreases of 90% or more
are possible).
EB-1889 > 1204 > PAGE 2 OF 14
Partitioned Primary Indexes
Executive Summary 2
Introduction 2
Definitions and Basics 3
How Much Can PPI Improve 3Performance?
How PPI Solves the Business 4Problem Example One
Can the First Example Be Improved 8Further?
A Second Example 9
A Final Example 10
Specifics of Defining a PPI Table 11
High-Level Partitioning Guidelines 13
High-Level Trade-off Considerations 13
Summary 14
Table of Contents
Executive Summary
Partitioned primary indexes, introduced in Teradata
Database V2R5, provide an opportunity to greatly
improve performance of certain queries, and to improve
the performance of high-volume insert, update, and
delete operations. The feature is flexible, yet easy to use,
and is largely transparent to end users.
-
The process for physically defining the
partitioning expression, via the CREATE
TABLE statement, is simple and straight-
forward. This paper gives some examples.
As is true for all physical database design
decisions, there are trade-off considera-
tions associated with each possible choice.
Its beyond the scope of this paper to discuss
the trade-off considerations at length.
The objective of this paper is to provide
realistic examples and actual performance
comparisons using PPI and non-PPI
solutions.
Definitions and Basics
In the context of PPI, partitioning refers
to the physical ordering of rows within
the table. The ordering is automatically
provided by the database management
software, and is determined by a user-
specified expression called the partitioning
expression. A PPI table physically is
substantially the same as a non-PPI table
except for the ordering of rows. More
specifically, the PI value is hashed to
distribute a row to a particular AMP in
an identical fashion for PPI and non-PPI
tables. Within each AMP, rows are ordered
by PI hash for non-PPI tables, and by
partition number first then PI hash for
PPI tables.
The partitioning expression is specified
on the CREATE TABLE statement in a
PARTITION BY clause following the
PRIMARY INDEX definition. The result
of the expression must be an integer value
or a value that can be cast to integer, and
the result indicates the partition number.
The columns referenced in the partition-
ing expression are called the partitioning
columns. A partition number must be
between 1 and 65,535, inclusive; therefore,
the maximum number of partitions that
can be defined for a table is 65,535.
Accessing a particular partition of a table
means accessing a subset of the table
beginning with the data block containing
the first row belonging to the partition
(on each AMP), and extending to the data
block containing the last row belonging
to the partition. The number of data
blocks will be zero if there are no rows
belonging to that partition (although it
may be necessary to read one data block
to determine that there are no rows for
the partition).
The term partition elimination refers to
an automatic optimization in which the
Optimizer determines, based on query
conditions and the partitioning expression,
that some partitions cannot contain
qualifying rows, and causes those partitions
to be skipped. Partitions that are skipped
for a particular query are called eliminated
partitions. Generally, the greatest benefit
of a PPI table is obtained from partition
elimination.
The term direct merge join is used to
describe a join in which the table of interest
is not spooled in preparation for a merge
join. The Optimizer may choose a direct
merge join when all columns of the PI are
specified in equality join terms.
The term direct product join is used to
describe a join in which the table of interest
is not spooled in preparation for a product
join. The Optimizer may choose a direct
product join when all the partitioning
columns are specified in equality join terms.
The ordering of rows within a table is
transparent to application developers, but
there are trade-off considerations involving
queries with partitioning column condi-
tions, queries that specify one or a few PI
values and queries that perform joins on
the PI columns. We will briefly discuss
these trade-off considerations in subse-
quent sections.
How Much Can PPIImprove Performance?
The performance gain depends on the
number of partitions and the specific
query being measured. In the best case,
the elapsed time reduction factor for a
specific query against a single table can
approach the reciprocal of the number
of partitions in the table. This means that
best-case PPI queries can take less than
1/100 of one percent of the time they
would take with a non-PPI table. The best
performance improvement occurs when
Partitioned Primary Indexes
EB-1889 > 1204 > PAGE 3 OF 14
-
there are many partitions with reasonably
even distribution of rows among the
partitions, and partition elimination
excludes all except one partition.
Figure 1 shows the results of actual
performance tests. The Baseline column
is the performance for a non-PPI table,
and the PPI column is the performance
for a PPI counterpart table. These tests are
considered to be realistic, but your results
may vary.
How PPI Solves theBusiness Problem theFirst Example
We start the discussion of when a PPI is
most appropriate by showing the differ-
ences between a PPI and non-PPI table
for a few examples. For the first example,
we stipulate a table and some processing
requirements, discuss the options available
prior to Teradata Database V2R5, and
discuss the optimization opportunities a
PPI provides.
Our hypothetical company has a large
sales table containing the details of each
transaction for the previous 24 full
months plus the current month-to-date.
Once per month, the transactions from
the oldest month are deleted. Current
transactions are added to the table nightly
using Teradata MultiLoad. Most transac-
tions are added on the date they occur,
but a small percentage of transactions may
be reported a few days after they occur.
The number of transactions per month is
roughly the same for all months.
Each row contains, among other things,
the product code for the item, the transac-
tion date, an identifier for the sales agent,
and the quantity sold. The rows are short,
Partitioned Primary Indexes
EB-1889 > 1204 > PAGE 4 OF 14
Figure 1. Actual Performance Test Results
Test Description Baseline PPI Improvement
Select rows that have a specied value of 59 seconds one second 98% reduction in the partitioning column (200 partitions elapsed time with roughly the same number of rows each)
Select a month of activity from one partition 58 seconds two seconds 96% reduction in containing six months of data (11 years of elapsed timedata contained in 40 partitions of unequal size)
Delete rows that have a specied value of the 239 seconds one second more than 99% reduction partitioning column (200 partitions of equal size) in elapsed time
Update one column in each row that has a 237 seconds three seconds 98% reduction in specied value of the partitioning column elapsed time (200 partitions of equal size)
MultiLoad insert a number of rows equal to 1% 1394 rows per 14,742 rows more than ten times of the table size into one partition (of 200) second per node per second faster
(larger numbers per nodeare better
MultiLoad insert a number of rows equal to 1% 841 rows per 5666 rows per more than six times of the table size into one partition (of 200) with second per second per faster one NUSI dened on the table node node
-
and the data blocks are large. The PI is a
composite of product code, transaction
date, and the agent identification. The
non-PPI definition of this table, showing
only a few of the most important columns,
is as follows:
CREATE TABLE SalesTable (
product_code CHAR(8),
sales_date DATE,
agent_id CHAR(8),
quantity_sold INTEGER,
other_columns CHAR(50))
PRIMARY INDEX (product_code,sales_date, agent_id);
There are four major categories of queries
against this table:
> A modest number of short-running
queries specify the PI values.
> Many ad hoc queries have the follow-
ing general pattern:
Compare one month of activity to
another month, or
Compare current-month-to-date
sales to the same days of the
previous month or to the same days
of the same month of the previous
year for a few product code values.
> Some queries analyze agent perform-
ance, usually over an interval of a
calendar quarter or less.
> Some queries examine sales trends over
the previous 24 full months, usually for
most or all product code values.
No other tables have the same PI
definition. The sales table is frequently
joined to relatively small tables containing
information about each product code and
each sales agent.
The DBA, prior to Teradata Database
V2R5, had a need to speed up ad hoc
queries and agent analysis queries. The
DBA considered creating a value-ordered
secondary index or join index on the
transaction date column, and had set up
tests for those scenarios. After running
and analyzing EXPLAINs, the DBA had
found that the Optimizer had determined
that neither index was selective enough to
be an improvement over a full-table scan.
The DBA then considered splitting the
table into 25 separate tables, each contain-
ing transactions for a calendar month.
Then, the DBA would create a view with
a UNION of all the tables for use by the
applications that analyze 24 months of
sales history. The DBA concluded that
this solution could indeed speed up the
targeted queries, but it added too much
complexity for the end users. Users would
have to understand the structure and
change the table names in their queries,
code more complicated UNION state-
ments, and select appropriate date ranges
and product code ranges. The need to
know the appropriate table name (from
the 25 different tables) would also apply
to applications submitting short-running
queries that specify the primary index.
This solution would also complicate
nightly load jobs, especially in the first few
days of a month when a few of the trans-
actions would be from the prior month.
The solution would also complicate the
archive strategy. In the end, this solution
was rejected as being too complicated and
error-prone.
With PPI, theres an excellent solution
for this example scenario. By adding a
PARTITION BY clause to the definition
of the replacement PPI table, it would be
easy to create 25 partitions, one for each
month (assuming the current date is in
October 2004).
CREATE TABLE PPI_SalesTable (
product_code CHAR(8),
sales_date DATE,
agent_id CHAR(8),
quantity_sold INTEGER,
other_columns CHAR(50))
PRIMARY INDEX (product_code,sales_ date, agent_id)
PARTITION BY RANGE_N (sales_date BETWEEN
DATE 2002-10-01 AND DATE 2004-10-31 EACH INTERVAL 1 MONTH);
The RANGE_N function was used in this
scenario to specify the beginning and
ending dates and the granularity of the
partitioning.
By converting the sales table into a table
partitioned by transaction month, many
of the queries would run faster (in this
scenario) with no significant negative
trade-off considerations. Lets examine
each element of the stated workload as it
applies to the newly-partitioned table in
more detail.
Partitioned Primary Indexes
EB-1889 > 1204 > PAGE 5 OF 14
-
Faster Monthly Deletes
Instead of using Teradata MultiLoad to
delete rows, the DBA could submit an
ALTER TABLE statement on a monthly
basis (see the next example) to drop the
oldest partition and delete its rows, and at
the same time create a new partition that
would contain data for the upcoming
month. Additional partitions for future
months could be added if desired. A delete
of all the rows in a partition is optimized
in much the same way that a delete of all
rows in a table has historically been
optimized. In both cases, there is no need
to record the individual rows in the
transient journal as theyre deleted. The
rows for the month being deleted are
physically stored contiguously (on each
AMP) instead of being scattered more or
less evenly among all the data blocks, as in
the non-PPI table, so there would be fewer
data blocks with rows to be deleted. Most
of the deletes would be full-block deletes
so the data block would not have to be
read or rewritten. Only one data block per
AMP would contain rows for the oldest
month plus the second oldest month, and
that would be the only data block read,
updated, and rewritten. There is also no
need to touch any of the rows for the
other month partitions. Dropping the
oldest partition(s) with an ALTER TABLE
statement is a nearly instantaneous
operation assuming there are no second-
ary indexes or join indexes that require
updates, there are no retained or added
partitions (such as NO RANGE) to move
the rows, and the option to make a copy of
the deleted rows is not specified. For
example, to drop the partition and delete
the rows for October 2002, and create a
partition for November 2004,
you would submit:
ALTER TABLE SalesTable MODIFYPRIMARY INDEX (product_code,sales_date, agent_id)
DROP RANGE BETWEEN
DATE 2002-10-01 AND DATE 2002-10-31
ADD RANGE BETWEENDATE 2004-11-01 AND DATE 2004-11-30
WITH DELETE;
Faster Teradata MultiLoad
Inserts
The nightly Teradata MultiLoad insert
job would run faster than it did for the
non-PPI table. Instead of the inserted rows
distributing more or less evenly among
all the data blocks of the table, as with the
non-PPI table, the inserted rows would be
concentrated in data blocks for the proper
month. This would increase the average
"hits per block" count (a key measure of
Teradata MultiLoad efficiency) and reduce
the number of data blocks that must be
read and rewritten.
Virtually No Change to Short-
Running Queries
Short-running queries that specify primary
index values would run approximately
as fast as on the non-PPI table. Since the
partitioning column is part of the primary
index, the PI access performance would not
be significantly changed.
Signicant Performance Gains
in Ad hoc Queries
Large gains would be seen in ad hoc queries
that, for example, compare a recent month
of sales data to a prior month. Due to
partition elimination, only two of the
25 partitions would be read instead of the
full-table scan required on the non-PPI
table. This means that the number of disk
reads would be reduced by roughly 92%
with a proportional reduction in elapsed
time. The 92% figure applies to the step
that reads the sales table, not to the sum
of all the steps used to accomplish the
query. Given the stated assumptions, the
other steps should take roughly the same
amount of time as for the non-PPI table.
The same considerations apply to the
agent analysis queries. The number of
partitions read is determined by the time
period specified in the query. Even if the
analysis is for twelve full months, there is
still roughly a 50% gain in reading twelve
of 25 partitions for the step that reads the
sales table.
No Degradation to Queries
Requiring a Full-Table Scan
Decision support queries that analyze 24
months of sales data would take roughly
the same time and resources as for the non-
PPI table. There would be a small gain from
reading 24 instead of 25 partitions. If the
analysis is for 24 months plus the current
month (i.e., the entire table), the resource
usage is the same as for the non-PPI table.
Partitioned Primary Indexes
EB-1889 > 1204 > PAGE 6 OF 14
-
Virtually No Degradation for
Joins
Joins would take roughly the same amount
of time. In this example, since there are no
other tables with the same primary index,
there are no direct merge joins to the sales
table. Joins to the product table and agent
table would most likely use the same join
strategy as when the sales table was not
partitioned. The join strategy would
typically be either a duplication of a small
table followed by a product join to the
sales table, or a redistribution of a spool
file followed by a merge join. Neither
strategy is less efficient with the partition-
ing of the sales table. Joins could even be
faster depending on the specific query
conditions and the possibility of partition
elimination.
More Efcient Archiving and
Restoring
In Teradata Database V2R6, partitions can
also be selectively archived, restored, and
copied. This can significantly reduce the
time to archive data by only archiving the
recently changed partitions. Restores of
selected partitions can be used to quickly
reload critical partitions.
Additional Disk Space
Required
The partitioned sales table would require
somewhat more disk space than the non-
partitioned counterpart due to the two-
byte partition number recorded in each
row. For this example, the percentage of
increase would be less than 3%.
Figure 2 summarizes the improvement
opportunities for the example.
Partitioned Primary Indexes
EB-1889 > 1204 > PAGE 7 OF 14
Figure 2. Example of PPI Improvement Opportunities
Activity Non-PPI Table PPI Table Improvement Comments
Nightly inserts Inserted rows Inserted rows Faster performance No changes to load scattered throughout concentrated in one script needed.table partition
Monthly delete of MultiLoad job reads ALTER TABLE Much faster Easier maintenanceone month of data most data blocks, statement deletes performance
updates most data partitionblocks
Primary index access One data block read One data block read No change No SQL changes needed
Comparison of All data blocks read Two partitions read Step is 12 times No SQL changes neededcurrent month to faster (two partitions prior month of 25 read)
Trend analysis over All data blocks read All data blocks read Little change Rows are two bytes longerentire table for PPI. 2% more data
blocks for 100-byte rows.
Joins No direct merge joins No direct merge Little change No direct merge joins due joins to choice of primary index.
Archive/Restore Entire table Entire table or Faster archives for Saves having to re-archive(in Teradata selected partitions selected partitions data already archivedDatabase V2R6)
-
Can the First Example BeImproved Further?
The first PPI solution, outlined above,
was to partition by month since many of
the queries use a month as their basic unit
of time. Another option to consider is
partitioning to a finer level. Let's compare
partitioning by month to partitioning by
day using the following PARTITION BY
clause:
PARTITION BY RANGE_N (sales_dateBETWEEN
DATE 2002-10-01 AND DATE 2004-10-31 EACH INTERVAL 1 DAY);
The table would now have about 760
partitions (two years with 365 days each
plus the current month of about 30 days).
Some small number of partitions, the
ones corresponding to future dates in the
current month, would be empty.
Virtually No Impact to the
Monthly Deletes
The monthly process deleting the oldest
month of data would virtually be the
same. Depending on the month, between
28 and 31 smaller partitions would be
deleted instead of one larger partition.
However, the same number of rows would
be deleted, and the run time for the job
would be roughly the same.
Faster Nightly Inserts
Nightly inserts would benefit from the
finer partitioning. Instead of being con-
centrated in one or two partitions out
of the 25 large partitions, as in the last
example, the rows would be inserted into
three to five smaller partitions of the 760
daily partitions, well under one percent
of the total. Most of the inserts would be
directed to the one partition that contains
the day's activity. This would increase the
hits per block, thereby improving the
performance of the inserts.
No Impact to Short-Running
Queries
Having 760 partitions instead of 25 would
not impact short-running PI access queries.
This is because in this example the parti-
tioning column is part of the primary
index. In other situations, there could be
a significant impact.
Modest Improvement for
Some Ad hoc Queries
Ad hoc queries that analyze two full
months of data would not be impacted.
They would now access about 60 parti-
tions out of the 760, instead of two out
of 25, roughly the same percentage of the
table. However, when queries vary by the
time of month, there would be some gain
by having the larger number of partitions.
For example, a query submitted on the
fifth day of the current month might
analyze four days for each of two months,
while a query submitted on the last day of
the current month might analyze about 30
days for each of two months. Instead of
two out of 25 monthly partitions (between
32 and 36 days of data), the query on the
fifth day of the current month would
involve eight out of 760 partitions (eight
days of data), which is a smaller percent-
age of the table. The query at the end of
the month would examine about 60 out of
760 partitions, which is substantially the
same as two out of 25 monthly partitions.
Analysis queries that examine 24 months
of data would run in about the same time
as they are examining most of the table in
either case.
The number of partitions would not
significantly impact the joins since there
are no direct merge joins against this table
in this scenario.
In summary, for this example, having a
larger number of smaller partitions would
produce modest gains and no degradation
to performance. The greatest gains would
be for queries that analyze only a few days
of transactions, and for the nightly loads.
Additionally in Teradata Database V2R6, a
days transactions (that is, a small partition
of data) could be selectively archived or
restored.
Partitioned Primary Indexes
EB-1889 > 1204 > PAGE 8 OF 14
-
A Second Example
While transaction date is frequently a good
choice for the partitioning column, it is not
the only choice. Let's consider a telephone
company's table with detailed information
about phone calls. There is a row for each
outgoing call with the originating phone
number, the timestamp for the start of the
call, and the call duration, among other
things. The rows are retained for a variable
length of time based on the call date and
the monthly bill preparation date. This is
not the same for every customer, and the
retention period is rarely more than six
weeks. The primary index is the phone
number and the call-start timestamp. This
implies the primary index was chosen to
provide good data distribution across the
AMPs. It is also obvious that the primary
index was not chosen for data access or to
facilitate direct merge joins. Some queries
analyze all calls from a particular phone
number. Other queries analyze all calls for
a particular period of time, perhaps for as
long as a month, for customers meeting
certain criteria. A non-PPI definition of
this table, showing only a few critical
columns, follows:
CREATE TABLE CallDetail (
phone_number DECIMAL(10) NOT NULL,
call_start TIMESTAMP,
call_duration INTEGER,
other_columns CHAR(30))
PRIMARY INDEX (phone_number,call_start);
One possibility for partitioning this table
would be to cast call_start as a date and
partition by date, similarly to the solution
in the first example. This would help with
inserting new activity in the same manner
as in the previous example. Deletion of
rows would not get the same performance
gain since the deletes are not strictly by
call date and, therefore, the deleted rows
would not be clustered in a partition. In
this case, the ALTER TABLE statement
could not be used, and the process would
not reap the same performance benefit
that deleting entire partitions provides.
The analysis queries that are based on the
date of the call would benefit with queries
specifying a range of a few days getting the
greatest gain.
Another choice would be to use the phone
number as the partitioning column. Phone
numbers contain too many digits to give
each number its own partition, but a
subset of the digits could be used. If the
first (high-order) three digits are used,
there would be 1000 partitions, some of
which would always be empty because of
the way phone numbers are assigned.
This partitioning expression would not
improve the performance of bulk inserts
or deletes, which would be scattered across
all partitions. It would not help with date-
based queries, but would allow queries
specifying a phone number to run much
faster as only one partition would be read
out of maybe 500 or more non-empty
partitions. A second advantage would be
to benefit geographic area analysis since
(in some parts of the world, at least) the
first three digits identify a particular area.
If 1000 partitions improve performance,
10,000 partitions (the first four digits of
the phone number) would probably be
even better. If 10,000 partitions were good,
maybe 50,000 would be better yet. We
cannot have 100,000 partitions, but we
could use the first five digits and assign
two consecutive numbers to each parti-
tion. Some partitions might be empty due
to the way phone numbers are assigned.
For example, this table definition creates
50,000 partitions using the first five digits
of the phone number:
CREATE TABLE PPI_CallDetail (
phone_number DECIMAL(10) NOT NULL,
call_start TIMESTAMP,
call_duration INTEGER,
other_columns CHAR(30))
PRIMARY INDEX (phone_number,call_start)
PARTITION BY RANGE_N (
CAST(phone_number / 100000.00000AS INTEGER) BETWEEN 0 AND 99999EACH 2);
If its not important to be able to map a
geographic area to one or more partitions,
another option would be to maximize the
number of partitions by using the parti-
tioning expression (phone_number mod
65535) + 1. If the table contains about
3.276 billion rows, on average each
Partitioned Primary Indexes
EB-1889 > 1204 > PAGE 9 OF 14
-
partition would contain about 50,000
rows. For a system with 100 AMPs, each
AMP would on average contain about
500 rows per partition, a number of rows
that might fit in one data block if the row
width was fairly small. The decrease in
response time of a one-partition scan for
all activity for a particular phone number
would be dramatic compared to the full-
table scan that would result with the
non-PPI table. A query to return activity
for one phone number is a best-case
scenario for single-table response time
improvement due to PPI. Disregarding the
overhead cost of initiating the query and
returning the answer set, the elapsed time
could be reduced to 1/65535 of the time
using the non-PPI table. Including the
query initiation and termination overhead,
the total query time improvement would
be somewhat less than a factor of 65,535,
but could be less than 1/10000 of the
non-PPI time. Here is a table definition to
use this partitioning:
CREATE TABLE PPI_CallDetail (
phone_number DECIMAL(10) NOT NULL,
call_start TIMESTAMP,
call_duration INTEGER,
other_columns CHAR(30))
PRIMARY INDEX (phone_number,call_start)
PARTITION BY phone_number MOD 65535 + 1;
The best choice, if any, of these proposed
partitioning expressions depends on the
mix of anticipated queries. The extended
logical data model can serve as the starting
point for making the decision, but some
amount of testing of different scenarios
will often be required.
A Final Example
The previous examples illustrate scenarios
where a PPI table is the correct choice. For
this example, we examine a more ambigu-
ous situation in which more trade-off
considerations apply, and the correct
solution is not as evident.
An invoice table contains data about
each invoice issued in the past four years.
The unique primary index is invoice
number. New rows are added nightly using
Teradata MultiLoad, and the oldest month
of data is deleted once per month. There
is a moderately heavy volume of queries
that get information about one specified
invoice. There are ad hoc analysis queries
that examine all invoices for some period
of time, usually less than a year. Other
tables have invoice number as their
primary index, but do not have an invoice
date column. There are frequent joins with
those other tables.
The DBA is considering whether it would
be advantageous to partition the invoice
table on invoice date using one-month
ranges.
The following are some of the considera-
tions that will apply:
Additional Disk Space
Required
The primary index is currently defined as
unique, but would have to be defined as
non-unique if the table was partitioned.
There is a business requirement to guaran-
tee that invoice numbers are unique.
Therefore, the DBA would have to define
a unique secondary index on the invoice
number column. This secondary index
would increase processing times on insert,
delete, and update operations, and con-
sume additional disk space. The base table
would also be larger, by two bytes per row,
further increasing the required disk space.
Slower Short-Running Queries
PI access queries would now use the
unique secondary index to access the
row. As a rule of thumb, accessing the
row using a secondary index would take
roughly two to three times as long as using
the primary index for the non-PPI table.
On a positive note, the PI access is a very
fast, usually a sub-second, operation.
Doubling or tripling the response time is
likely to go unnoticed to the users who
issue those queries.
Slower Long-Running Queries
Direct merge joins (without partition
elimination) would at best require more
memory and CPU time, and may be
measurably slower compared to a similar
Partitioned Primary Indexes
EB-1889 > 1204 > PAGE 10 OF 14
-
Partitioned Primary Indexes
EB-1889 > 1204 > PAGE 11 OF 14
non-PPI table. The amount of perform-
ance degradation will depend on the query
conditions, how many partitions can be
eliminated, and the specific join plan
chosen by the Optimizer. Actual measure-
ment of representative queries will be
required to determine the overall differ-
ence in performance.
Impact on Table Maintenance
Nightly inserts would benefit in the same
way as in the first example for the same
reasons. However, the additional index on
invoice number would partially offset the
benefit. Since Teradata MultiLoad does not
support unique secondary indexes, the
index would need to be dropped prior to
the MultiLoad job and then recreated after
the job. Alternatively, this may be an
opportunity to move to a near-real-time
load strategy using, for example, Teradata
TPump.
The same considerations as in the first
example apply to the monthly deletes.
Similarly, in Teradata Database V2R6,
benefits may occur with archives and
restores of selected partitions.
Faster Ad hoc Queries
Ad hoc queries examining several months
of invoices would benefit in the same
way as in the first example. The benefit
would be greatest when fewer months are
examined.
Would it be worthwhile to convert the
invoice table to use a PPI? The DBA will
need to measure the amount of improve-
ment and degradation in the various types
of queries, and determine how much each
query type contributes to the overall
workload involving this table. This will
provide an estimate of the overall work-
load performance with and without a PPI
table. If the difference between a PPI and
non-PPI table performance is substantial
in either direction, the choice will be
evident for the overall workload. But the
DBA should also consider the relative
importance of the various activities. For
example, if the nightly insert volume is
starting to overwhelm the time set aside
for inserting new activity, even a small
improvement in load time might be
considered sufficiently important to offset
larger degradations in queries. Similarly, if
the response time of PI queries is critical,
even a small degradation in those queries
might be considered unacceptable even if
overall workload performance is improved.
In short, measurement and analysis is
required to come to a rational decision
for this case.
Specifics of Defining a PPI Table
The PRIMARY INDEX clause of the
CREATE TABLE statement may be
followed by an optional PARTITION BY
partitioning_expression clause. The parti-
tioning expression is a general expression
allowing wide flexibility in tailoring the
partitioning expression to the unique
characteristics of the table. Two functions,
RANGE_N and CASE_N, are provided
to simplify the creation of partitioning
expressions.
One or more columns can make up the
partitioning expression although its
anticipated that, for most tables, one
column will be specified. The partitioning
columns can be part of the primary index,
but are not required to be. The result of
the partitioning expression must be a
scalar value that is INTEGER or can be
cast to INTEGER. Most deterministic
functions can be used within the expres-
sion. The expression must not require
character or graphic comparisons,
although character or graphic columns
can be referenced in some circumstances.
If the partitioning columns are not all part
of the primary index, the primary index
cannot be defined as unique although a
unique secondary index can be defined on
the same columns as the primary index.
Only base tables can be PPI tables. This
excludes global temporary tables, volatile
tables, join indexes, hash indexes, and
secondary indexes. This restriction does
not mean that a PPI table cannot have
secondary indexes or cannot be referenced
in the definition of a join index or hash
index. It merely means that the PARTI-
TION BY clause is not available on a
CREATE GLOBAL TEMPORARY TABLE,
CREATE VOLATILE TABLE, CREATE
INDEX, CREATE JOIN INDEX, or
CREATE HASH INDEX statement.
In the general case, there can be up to
65,535 partitions numbered from one.
As rows are inserted into the table, the
partitioning expression is evaluated to
determine the proper partition placement
-
for that row. A two-byte internal represen-
tation of the partition number is
embedded in the row as part of the row
identifier making PPI rows two bytes
wider than they would be if the table
wasnt partitioned. Secondary indexes
referencing PPI tables use the wider row
identifier, making those rows wider as
well.1 Except for the embedded internal
partition number, PPI rows have the same
format as non-PPI rows. A data block can
contain rows from multiple consecutive
partitions. There are no new control
structures to implement the partitioning
expression.
Sample uses of partitioning expressions
were shown in the discussions of the
examples that were presented earlier. While
the examples were simple, the partitioning
expression is a general expression, which
makes it possible to define complex
partitioning schemes tailored to the
processing needs of individual tables.
However, a simple partitioning expression
(for instance, RANGE_N on a single date
column) may provide the best opportuni-
ties for partition elimination in queries.
The Optimizer does partition elimination
for a query by analyzing the constraints on
the partitioning columns in the context of
the partitioning expression. Constraints
that compare the partitioning columns to
be equal to constant expressions provide
partition elimination. Also, range con-
straints on the partitioning column where
the partitioning column is compared to
constant expressions and the partitioning
expression is a single column or a
RANGE_N function on a single column
can provide partition elimination. In some
cases, the constant expressions may
contain USING variables and still provide
partition elimination.
Joins on the primary index columns of a
partitioned table that are equated to the
columns of another table are also opti-
mized when there are a small number of
non-eliminated partitions. In this case,
a set of partitions can be directly read in
a sliding window of merge joins and,
thereby, avoid spooling the partitioned
table prior to the join. If also joined by
equality on the partitioning columns, a
rowkey merge join simplifies and improves
the performance of the merge join.
In Teradata Database V2R5.1, dynamic
partition elimination can occur when
there is an equality constraint between
the partitioning column of one table and
a column of another table. This is useful
when looking up a row in one table and
matching those rows to corresponding
partitions (using a product join) instead of
a product join to the entire table. Teradata
Database V2R6 further extends dynamic
partition elimination to merge joins.
Another enhancement in Teradata Data-
base V2R6 provides partition elimination
on the referencing rowids of a secondary
index. Instead of looking up all the rows
in the base table for particular index value,
only rows in the base table referenced by
rowids pointing to non-eliminated
partitions are read.
Teradata Database V2R6 also makes a
Non-Unique Secondary Index (NUSI)
access a single-AMP operation if the
NUSI is on the same columns as the
Non-Unique Primary Index (NUPI) with
an equality condition on the NUSI. Note
that a NUSI on the same columns as the
NUPI is only allowed for a PPI table. This
potentially provides a faster access path
than using the NUPI but with the same
single-AMP and rowhash locking charac-
teristics. This can occur when the number
of occurrences of a NUSI value is less than
the number of partitions.
As mentioned earlier, Teradata Database
V2R6 provides for archives and restores of
selected partitions.
The ALTER TABLE statement has been
extended to support PPI. An example was
shown in the section How PPI Solves the
Business Problem the First Example to
drop the partition containing the oldest
transactions and create expansion parti-
tions for future dates. This is a simple
example, but it does illustrate the capability.
The ability to ALTER a PPI table provides
a simple and convenient mechanism for
the DBA to perform periodic maintenance
on a range-based PPI table.
Partitioned Primary Indexes
EB-1889 > 1204 > PAGE 12 OF 14
1 A join index or hash index that references a table using a row identier uses the wider format whetheror not the table has a partitioned primary index starting with Teradata Database V2R5.
-
High-Level PartitioningGuidelines
Here are some general guidelines, with
limited discussion, to help determine
whether and how to partition a table:
1. Large tables are good candidates for
partitioning.
2. Partition on a column that is fre-
quently used as a restrictive query
condition.
3. If other factors are equal, partition on
a column that is part of the primary
index in preference to a column that
is not, unless the primary index is
seldom, if ever, used for access or joins.
4. While there are few restrictions on the
partitioning expression, a partitioning
expression is only useful if the Opti-
mizer can effectively apply partition
elimination to queries. A simple
partitioning expression is more likely
to give the maximum amount of
partition elimination than a more
complex expression. For example, a
RANGE_N function on a date column
can often be an effective partitioning
expression for queries with range
constraints on the partitioning column.
5. Use RANGE_N or CASE_N in prefer-
ence to direct use of a column in
most situations. The Optimizer can
determine the maximum number
of partitions when RANGE_N or
CASE_N is used, and will have to
assume 65,535 partitions otherwise.
Join costing, in particular, can be more
accurate when the actual number of
partitions is known and fairly small
than when the number is assumed to
be 65,535.
6. Unless the PI is rarely used for access
or direct merge joins, keep the number
of partitions fairly small when the
partitioning expression uses columns
that are not part of the PI.
7. The same considerations regarding
the selection of the primary index
apply to PPI tables as non-PPI tables.
Choose PI columns that provide good
distribution and avoid large clumps of
duplicate PI values, and which are most
commonly used to access individual
rows in the table. Sometimes those two
considerations conflict, and a reason-
able compromise between the two
must be reached.
A more detailed description of partition-
ing guidelines may be found in the
Teradata Orange Book: Partitioned Primary
Index Usage.
High-Level Trade-offConsiderations
The greatest potential gain derived from
partitioning a table is the ability to read
a small subset of the table instead of the
entire table. For example, a query that
examines two months of sales from a table
with two years of sales history would read
about one-twelfth of the table instead of
all of it. This can provide a large perform-
ance boost for a wide range of queries, day
after day, and is automatic. SQL authors
need not be aware of the partitioning
structure, and no changes are required to
existing SQL.
A second potential advantage is faster
batch loads. If the table is partitioned
by transaction date, nightly loads of
transactions for the current day can be
dramatically improved. Similarly, the time
to delete old rows no longer needed can be
dramatically faster (nearly instantaneous
in some cases) when the table is parti-
tioned by transaction date.
Finally with Teradata Database V2R6,
you can perform archives and restores of
selected partitions. This allows for more
frequent, but less costly archives. For
restores, critical data (for example, in the
most recent partitions) can be restored
quickly and made available to users
without waiting for the entire table to
be restored.
In the above situations, the improvement
may be even greater when the partitioning
structure makes one or more secondary
indexes or join indexes redundant, allow-
ing those indexes to be dropped.
Offsetting these gains are some potential
disadvantages of partitioning. The first
disadvantage is that PI access of the table
may be slower when a partitioning column
is not part of the PI. This disadvantage can
Partitioned Primary Indexes
EB-1889 > 1204 > PAGE 13 OF 14
-
be offset by choosing partitioning columns
that are part of the PI, specifying the
values of the partitioning columns and
the PI columns, or, in some situations, by
defining a secondary index.
A second disadvantage is that direct merge
joins involving a partitioned table may be
slower unless both tables can be identically
partitioned. The disadvantage can be offset
when the query conditions allow some
partitions to be eliminated from the join.
As in all physical design choices, you must
weigh the trade-off considerations and test
assumptions to get the best results.
A more detailed description of trade-off
considerations may be found in the
Teradata Orange Book: Partitioned Primary
Index Usage.
Summary
PPI tables can dramatically improve
performance of certain types of queries,
especially those that access only a small
part of a large table. High-volume data
load and data maintenance times can
also be improved when, for example,
the transaction date is specified as the
partitioning column.
A partitioned primary index is flexible and
easy to use. PPI tables retain the traditional
uses of primary indexes to distribute data
evenly and provide very fast access when
the primary index value is specified in
the query.
No changes to existing SQL are necessary.
Users accessing a PPI table will see no
difference, except perhaps for different
average response times.
Whether and how to partition the primary
index of a table is a physical design choice.
The trade-off considerations associated
with a PPI should be understood and
considered when making the physical
design decisions.
The extended logical data model can serve
as the starting point for making physical
design decisions, but some amount of
testing of different scenarios will often be
required. As with other physical design
decisions, the total workload and relative
importance of the workload components
must be examined to determine whether
the benefits will outweigh the disadvan-
tages for each design decision.
Partitioned Primary Indexes
EB-1889 > 1204 > PAGE 14 OF 14
Teradata and NCR are registered trademarks of NCR Corporation. NCR continually enhances products as new technologies and components become available. NCR, therefore, reserves the right to change specications without prior notice. All features, functions, and operations described herein may not be marketed in all parts of the world. Consult your Teradata representative or visit Teradata.com for more information. No part of this publication may be reprinted or otherwisereproduced without permission from Teradata.
This document, which includes the information contained herein, is the exclusive property of NCR Corporation. Any person is hereby authorized to view, copy, print,and distribute this document subject to the following conditions. This document may be used for non-commercial, informational purposes only and is provided onan AS-IS basis. Any copy of this document or portion thereof must include this copyright notice and all other restrictive legends appearing in this document. Note that any product, process or technology described in the document may be the subject of other intellectual property rights reserved by NCR and are notlicensed hereunder. No license rights will be implied. Use, duplication or disclosure by the United States government is subject to the restrictions set forth in DFARS252.227-7013 (c) (1) (ii) and FAR 52.227-19.
2004 NCR Corporation Dayton, OH U.S.A. Produced in U.S.A. All Rights Reserved.
Teradata.com