ppi

14
Partitioned Primary Indexes Jerry Klindt (updated by Paul Sinclair) October 20, 2004 Data Warehousing > Database

Upload: elango

Post on 09-Sep-2015

12 views

Category:

Documents


1 download

DESCRIPTION

Teradata - Partitioned Primary Indexes

TRANSCRIPT

  • Partitioned Primary Indexes

    Jerry Klindt

    (updated by Paul Sinclair)

    October 20, 2004

    Data Warehousing > Database

  • Introduction

    Some common business queries generally

    require a full-table scan of a large table

    even though its predictable that a fairly

    small percentage of the rows will qualify.

    One example of such a query is a trend

    analysis application that compares current

    month sales to the previous month, or to

    the same month of the previous year,

    using a table with several years of sales

    detail. Another example is an application

    that compares customer behavior in one

    geographic region to another region.

    Prior to Teradata Database V2R5, there

    were few viable opportunities for a

    Database Administrator (DBA) to struc-

    ture the data warehouse in a manner that

    allowed such queries to avoid full-table

    scans. Starting with Teradata Database

    V2R5, the DBA has a flexible and powerful

    tool to structure tables to allow automatic

    optimization of frequently used queries

    of this class. That tool is the partitioned

    primary index (PPI). A PPI allows a table

    to be partitioned on columns of interest

    while retaining the traditional use of the

    primary index (PI) for data distribution

    and efficient access when the PI values are

    specified in the query.

    A carefully-chosen partitioning expression

    can result in partial-table scans instead

    of full-table scans with dramatic improve-

    ments in resource consumption and

    elapsed time (elapsed time decreases of

    99% or more are possible). Batch insert

    and update times may also be improved

    when the partitioning column is chosen

    to match the arrival pattern of the data

    (elapsed time decreases of 90% or more

    are possible).

    EB-1889 > 1204 > PAGE 2 OF 14

    Partitioned Primary Indexes

    Executive Summary 2

    Introduction 2

    Definitions and Basics 3

    How Much Can PPI Improve 3Performance?

    How PPI Solves the Business 4Problem Example One

    Can the First Example Be Improved 8Further?

    A Second Example 9

    A Final Example 10

    Specifics of Defining a PPI Table 11

    High-Level Partitioning Guidelines 13

    High-Level Trade-off Considerations 13

    Summary 14

    Table of Contents

    Executive Summary

    Partitioned primary indexes, introduced in Teradata

    Database V2R5, provide an opportunity to greatly

    improve performance of certain queries, and to improve

    the performance of high-volume insert, update, and

    delete operations. The feature is flexible, yet easy to use,

    and is largely transparent to end users.

  • The process for physically defining the

    partitioning expression, via the CREATE

    TABLE statement, is simple and straight-

    forward. This paper gives some examples.

    As is true for all physical database design

    decisions, there are trade-off considera-

    tions associated with each possible choice.

    Its beyond the scope of this paper to discuss

    the trade-off considerations at length.

    The objective of this paper is to provide

    realistic examples and actual performance

    comparisons using PPI and non-PPI

    solutions.

    Definitions and Basics

    In the context of PPI, partitioning refers

    to the physical ordering of rows within

    the table. The ordering is automatically

    provided by the database management

    software, and is determined by a user-

    specified expression called the partitioning

    expression. A PPI table physically is

    substantially the same as a non-PPI table

    except for the ordering of rows. More

    specifically, the PI value is hashed to

    distribute a row to a particular AMP in

    an identical fashion for PPI and non-PPI

    tables. Within each AMP, rows are ordered

    by PI hash for non-PPI tables, and by

    partition number first then PI hash for

    PPI tables.

    The partitioning expression is specified

    on the CREATE TABLE statement in a

    PARTITION BY clause following the

    PRIMARY INDEX definition. The result

    of the expression must be an integer value

    or a value that can be cast to integer, and

    the result indicates the partition number.

    The columns referenced in the partition-

    ing expression are called the partitioning

    columns. A partition number must be

    between 1 and 65,535, inclusive; therefore,

    the maximum number of partitions that

    can be defined for a table is 65,535.

    Accessing a particular partition of a table

    means accessing a subset of the table

    beginning with the data block containing

    the first row belonging to the partition

    (on each AMP), and extending to the data

    block containing the last row belonging

    to the partition. The number of data

    blocks will be zero if there are no rows

    belonging to that partition (although it

    may be necessary to read one data block

    to determine that there are no rows for

    the partition).

    The term partition elimination refers to

    an automatic optimization in which the

    Optimizer determines, based on query

    conditions and the partitioning expression,

    that some partitions cannot contain

    qualifying rows, and causes those partitions

    to be skipped. Partitions that are skipped

    for a particular query are called eliminated

    partitions. Generally, the greatest benefit

    of a PPI table is obtained from partition

    elimination.

    The term direct merge join is used to

    describe a join in which the table of interest

    is not spooled in preparation for a merge

    join. The Optimizer may choose a direct

    merge join when all columns of the PI are

    specified in equality join terms.

    The term direct product join is used to

    describe a join in which the table of interest

    is not spooled in preparation for a product

    join. The Optimizer may choose a direct

    product join when all the partitioning

    columns are specified in equality join terms.

    The ordering of rows within a table is

    transparent to application developers, but

    there are trade-off considerations involving

    queries with partitioning column condi-

    tions, queries that specify one or a few PI

    values and queries that perform joins on

    the PI columns. We will briefly discuss

    these trade-off considerations in subse-

    quent sections.

    How Much Can PPIImprove Performance?

    The performance gain depends on the

    number of partitions and the specific

    query being measured. In the best case,

    the elapsed time reduction factor for a

    specific query against a single table can

    approach the reciprocal of the number

    of partitions in the table. This means that

    best-case PPI queries can take less than

    1/100 of one percent of the time they

    would take with a non-PPI table. The best

    performance improvement occurs when

    Partitioned Primary Indexes

    EB-1889 > 1204 > PAGE 3 OF 14

  • there are many partitions with reasonably

    even distribution of rows among the

    partitions, and partition elimination

    excludes all except one partition.

    Figure 1 shows the results of actual

    performance tests. The Baseline column

    is the performance for a non-PPI table,

    and the PPI column is the performance

    for a PPI counterpart table. These tests are

    considered to be realistic, but your results

    may vary.

    How PPI Solves theBusiness Problem theFirst Example

    We start the discussion of when a PPI is

    most appropriate by showing the differ-

    ences between a PPI and non-PPI table

    for a few examples. For the first example,

    we stipulate a table and some processing

    requirements, discuss the options available

    prior to Teradata Database V2R5, and

    discuss the optimization opportunities a

    PPI provides.

    Our hypothetical company has a large

    sales table containing the details of each

    transaction for the previous 24 full

    months plus the current month-to-date.

    Once per month, the transactions from

    the oldest month are deleted. Current

    transactions are added to the table nightly

    using Teradata MultiLoad. Most transac-

    tions are added on the date they occur,

    but a small percentage of transactions may

    be reported a few days after they occur.

    The number of transactions per month is

    roughly the same for all months.

    Each row contains, among other things,

    the product code for the item, the transac-

    tion date, an identifier for the sales agent,

    and the quantity sold. The rows are short,

    Partitioned Primary Indexes

    EB-1889 > 1204 > PAGE 4 OF 14

    Figure 1. Actual Performance Test Results

    Test Description Baseline PPI Improvement

    Select rows that have a specied value of 59 seconds one second 98% reduction in the partitioning column (200 partitions elapsed time with roughly the same number of rows each)

    Select a month of activity from one partition 58 seconds two seconds 96% reduction in containing six months of data (11 years of elapsed timedata contained in 40 partitions of unequal size)

    Delete rows that have a specied value of the 239 seconds one second more than 99% reduction partitioning column (200 partitions of equal size) in elapsed time

    Update one column in each row that has a 237 seconds three seconds 98% reduction in specied value of the partitioning column elapsed time (200 partitions of equal size)

    MultiLoad insert a number of rows equal to 1% 1394 rows per 14,742 rows more than ten times of the table size into one partition (of 200) second per node per second faster

    (larger numbers per nodeare better

    MultiLoad insert a number of rows equal to 1% 841 rows per 5666 rows per more than six times of the table size into one partition (of 200) with second per second per faster one NUSI dened on the table node node

  • and the data blocks are large. The PI is a

    composite of product code, transaction

    date, and the agent identification. The

    non-PPI definition of this table, showing

    only a few of the most important columns,

    is as follows:

    CREATE TABLE SalesTable (

    product_code CHAR(8),

    sales_date DATE,

    agent_id CHAR(8),

    quantity_sold INTEGER,

    other_columns CHAR(50))

    PRIMARY INDEX (product_code,sales_date, agent_id);

    There are four major categories of queries

    against this table:

    > A modest number of short-running

    queries specify the PI values.

    > Many ad hoc queries have the follow-

    ing general pattern:

    Compare one month of activity to

    another month, or

    Compare current-month-to-date

    sales to the same days of the

    previous month or to the same days

    of the same month of the previous

    year for a few product code values.

    > Some queries analyze agent perform-

    ance, usually over an interval of a

    calendar quarter or less.

    > Some queries examine sales trends over

    the previous 24 full months, usually for

    most or all product code values.

    No other tables have the same PI

    definition. The sales table is frequently

    joined to relatively small tables containing

    information about each product code and

    each sales agent.

    The DBA, prior to Teradata Database

    V2R5, had a need to speed up ad hoc

    queries and agent analysis queries. The

    DBA considered creating a value-ordered

    secondary index or join index on the

    transaction date column, and had set up

    tests for those scenarios. After running

    and analyzing EXPLAINs, the DBA had

    found that the Optimizer had determined

    that neither index was selective enough to

    be an improvement over a full-table scan.

    The DBA then considered splitting the

    table into 25 separate tables, each contain-

    ing transactions for a calendar month.

    Then, the DBA would create a view with

    a UNION of all the tables for use by the

    applications that analyze 24 months of

    sales history. The DBA concluded that

    this solution could indeed speed up the

    targeted queries, but it added too much

    complexity for the end users. Users would

    have to understand the structure and

    change the table names in their queries,

    code more complicated UNION state-

    ments, and select appropriate date ranges

    and product code ranges. The need to

    know the appropriate table name (from

    the 25 different tables) would also apply

    to applications submitting short-running

    queries that specify the primary index.

    This solution would also complicate

    nightly load jobs, especially in the first few

    days of a month when a few of the trans-

    actions would be from the prior month.

    The solution would also complicate the

    archive strategy. In the end, this solution

    was rejected as being too complicated and

    error-prone.

    With PPI, theres an excellent solution

    for this example scenario. By adding a

    PARTITION BY clause to the definition

    of the replacement PPI table, it would be

    easy to create 25 partitions, one for each

    month (assuming the current date is in

    October 2004).

    CREATE TABLE PPI_SalesTable (

    product_code CHAR(8),

    sales_date DATE,

    agent_id CHAR(8),

    quantity_sold INTEGER,

    other_columns CHAR(50))

    PRIMARY INDEX (product_code,sales_ date, agent_id)

    PARTITION BY RANGE_N (sales_date BETWEEN

    DATE 2002-10-01 AND DATE 2004-10-31 EACH INTERVAL 1 MONTH);

    The RANGE_N function was used in this

    scenario to specify the beginning and

    ending dates and the granularity of the

    partitioning.

    By converting the sales table into a table

    partitioned by transaction month, many

    of the queries would run faster (in this

    scenario) with no significant negative

    trade-off considerations. Lets examine

    each element of the stated workload as it

    applies to the newly-partitioned table in

    more detail.

    Partitioned Primary Indexes

    EB-1889 > 1204 > PAGE 5 OF 14

  • Faster Monthly Deletes

    Instead of using Teradata MultiLoad to

    delete rows, the DBA could submit an

    ALTER TABLE statement on a monthly

    basis (see the next example) to drop the

    oldest partition and delete its rows, and at

    the same time create a new partition that

    would contain data for the upcoming

    month. Additional partitions for future

    months could be added if desired. A delete

    of all the rows in a partition is optimized

    in much the same way that a delete of all

    rows in a table has historically been

    optimized. In both cases, there is no need

    to record the individual rows in the

    transient journal as theyre deleted. The

    rows for the month being deleted are

    physically stored contiguously (on each

    AMP) instead of being scattered more or

    less evenly among all the data blocks, as in

    the non-PPI table, so there would be fewer

    data blocks with rows to be deleted. Most

    of the deletes would be full-block deletes

    so the data block would not have to be

    read or rewritten. Only one data block per

    AMP would contain rows for the oldest

    month plus the second oldest month, and

    that would be the only data block read,

    updated, and rewritten. There is also no

    need to touch any of the rows for the

    other month partitions. Dropping the

    oldest partition(s) with an ALTER TABLE

    statement is a nearly instantaneous

    operation assuming there are no second-

    ary indexes or join indexes that require

    updates, there are no retained or added

    partitions (such as NO RANGE) to move

    the rows, and the option to make a copy of

    the deleted rows is not specified. For

    example, to drop the partition and delete

    the rows for October 2002, and create a

    partition for November 2004,

    you would submit:

    ALTER TABLE SalesTable MODIFYPRIMARY INDEX (product_code,sales_date, agent_id)

    DROP RANGE BETWEEN

    DATE 2002-10-01 AND DATE 2002-10-31

    ADD RANGE BETWEENDATE 2004-11-01 AND DATE 2004-11-30

    WITH DELETE;

    Faster Teradata MultiLoad

    Inserts

    The nightly Teradata MultiLoad insert

    job would run faster than it did for the

    non-PPI table. Instead of the inserted rows

    distributing more or less evenly among

    all the data blocks of the table, as with the

    non-PPI table, the inserted rows would be

    concentrated in data blocks for the proper

    month. This would increase the average

    "hits per block" count (a key measure of

    Teradata MultiLoad efficiency) and reduce

    the number of data blocks that must be

    read and rewritten.

    Virtually No Change to Short-

    Running Queries

    Short-running queries that specify primary

    index values would run approximately

    as fast as on the non-PPI table. Since the

    partitioning column is part of the primary

    index, the PI access performance would not

    be significantly changed.

    Signicant Performance Gains

    in Ad hoc Queries

    Large gains would be seen in ad hoc queries

    that, for example, compare a recent month

    of sales data to a prior month. Due to

    partition elimination, only two of the

    25 partitions would be read instead of the

    full-table scan required on the non-PPI

    table. This means that the number of disk

    reads would be reduced by roughly 92%

    with a proportional reduction in elapsed

    time. The 92% figure applies to the step

    that reads the sales table, not to the sum

    of all the steps used to accomplish the

    query. Given the stated assumptions, the

    other steps should take roughly the same

    amount of time as for the non-PPI table.

    The same considerations apply to the

    agent analysis queries. The number of

    partitions read is determined by the time

    period specified in the query. Even if the

    analysis is for twelve full months, there is

    still roughly a 50% gain in reading twelve

    of 25 partitions for the step that reads the

    sales table.

    No Degradation to Queries

    Requiring a Full-Table Scan

    Decision support queries that analyze 24

    months of sales data would take roughly

    the same time and resources as for the non-

    PPI table. There would be a small gain from

    reading 24 instead of 25 partitions. If the

    analysis is for 24 months plus the current

    month (i.e., the entire table), the resource

    usage is the same as for the non-PPI table.

    Partitioned Primary Indexes

    EB-1889 > 1204 > PAGE 6 OF 14

  • Virtually No Degradation for

    Joins

    Joins would take roughly the same amount

    of time. In this example, since there are no

    other tables with the same primary index,

    there are no direct merge joins to the sales

    table. Joins to the product table and agent

    table would most likely use the same join

    strategy as when the sales table was not

    partitioned. The join strategy would

    typically be either a duplication of a small

    table followed by a product join to the

    sales table, or a redistribution of a spool

    file followed by a merge join. Neither

    strategy is less efficient with the partition-

    ing of the sales table. Joins could even be

    faster depending on the specific query

    conditions and the possibility of partition

    elimination.

    More Efcient Archiving and

    Restoring

    In Teradata Database V2R6, partitions can

    also be selectively archived, restored, and

    copied. This can significantly reduce the

    time to archive data by only archiving the

    recently changed partitions. Restores of

    selected partitions can be used to quickly

    reload critical partitions.

    Additional Disk Space

    Required

    The partitioned sales table would require

    somewhat more disk space than the non-

    partitioned counterpart due to the two-

    byte partition number recorded in each

    row. For this example, the percentage of

    increase would be less than 3%.

    Figure 2 summarizes the improvement

    opportunities for the example.

    Partitioned Primary Indexes

    EB-1889 > 1204 > PAGE 7 OF 14

    Figure 2. Example of PPI Improvement Opportunities

    Activity Non-PPI Table PPI Table Improvement Comments

    Nightly inserts Inserted rows Inserted rows Faster performance No changes to load scattered throughout concentrated in one script needed.table partition

    Monthly delete of MultiLoad job reads ALTER TABLE Much faster Easier maintenanceone month of data most data blocks, statement deletes performance

    updates most data partitionblocks

    Primary index access One data block read One data block read No change No SQL changes needed

    Comparison of All data blocks read Two partitions read Step is 12 times No SQL changes neededcurrent month to faster (two partitions prior month of 25 read)

    Trend analysis over All data blocks read All data blocks read Little change Rows are two bytes longerentire table for PPI. 2% more data

    blocks for 100-byte rows.

    Joins No direct merge joins No direct merge Little change No direct merge joins due joins to choice of primary index.

    Archive/Restore Entire table Entire table or Faster archives for Saves having to re-archive(in Teradata selected partitions selected partitions data already archivedDatabase V2R6)

  • Can the First Example BeImproved Further?

    The first PPI solution, outlined above,

    was to partition by month since many of

    the queries use a month as their basic unit

    of time. Another option to consider is

    partitioning to a finer level. Let's compare

    partitioning by month to partitioning by

    day using the following PARTITION BY

    clause:

    PARTITION BY RANGE_N (sales_dateBETWEEN

    DATE 2002-10-01 AND DATE 2004-10-31 EACH INTERVAL 1 DAY);

    The table would now have about 760

    partitions (two years with 365 days each

    plus the current month of about 30 days).

    Some small number of partitions, the

    ones corresponding to future dates in the

    current month, would be empty.

    Virtually No Impact to the

    Monthly Deletes

    The monthly process deleting the oldest

    month of data would virtually be the

    same. Depending on the month, between

    28 and 31 smaller partitions would be

    deleted instead of one larger partition.

    However, the same number of rows would

    be deleted, and the run time for the job

    would be roughly the same.

    Faster Nightly Inserts

    Nightly inserts would benefit from the

    finer partitioning. Instead of being con-

    centrated in one or two partitions out

    of the 25 large partitions, as in the last

    example, the rows would be inserted into

    three to five smaller partitions of the 760

    daily partitions, well under one percent

    of the total. Most of the inserts would be

    directed to the one partition that contains

    the day's activity. This would increase the

    hits per block, thereby improving the

    performance of the inserts.

    No Impact to Short-Running

    Queries

    Having 760 partitions instead of 25 would

    not impact short-running PI access queries.

    This is because in this example the parti-

    tioning column is part of the primary

    index. In other situations, there could be

    a significant impact.

    Modest Improvement for

    Some Ad hoc Queries

    Ad hoc queries that analyze two full

    months of data would not be impacted.

    They would now access about 60 parti-

    tions out of the 760, instead of two out

    of 25, roughly the same percentage of the

    table. However, when queries vary by the

    time of month, there would be some gain

    by having the larger number of partitions.

    For example, a query submitted on the

    fifth day of the current month might

    analyze four days for each of two months,

    while a query submitted on the last day of

    the current month might analyze about 30

    days for each of two months. Instead of

    two out of 25 monthly partitions (between

    32 and 36 days of data), the query on the

    fifth day of the current month would

    involve eight out of 760 partitions (eight

    days of data), which is a smaller percent-

    age of the table. The query at the end of

    the month would examine about 60 out of

    760 partitions, which is substantially the

    same as two out of 25 monthly partitions.

    Analysis queries that examine 24 months

    of data would run in about the same time

    as they are examining most of the table in

    either case.

    The number of partitions would not

    significantly impact the joins since there

    are no direct merge joins against this table

    in this scenario.

    In summary, for this example, having a

    larger number of smaller partitions would

    produce modest gains and no degradation

    to performance. The greatest gains would

    be for queries that analyze only a few days

    of transactions, and for the nightly loads.

    Additionally in Teradata Database V2R6, a

    days transactions (that is, a small partition

    of data) could be selectively archived or

    restored.

    Partitioned Primary Indexes

    EB-1889 > 1204 > PAGE 8 OF 14

  • A Second Example

    While transaction date is frequently a good

    choice for the partitioning column, it is not

    the only choice. Let's consider a telephone

    company's table with detailed information

    about phone calls. There is a row for each

    outgoing call with the originating phone

    number, the timestamp for the start of the

    call, and the call duration, among other

    things. The rows are retained for a variable

    length of time based on the call date and

    the monthly bill preparation date. This is

    not the same for every customer, and the

    retention period is rarely more than six

    weeks. The primary index is the phone

    number and the call-start timestamp. This

    implies the primary index was chosen to

    provide good data distribution across the

    AMPs. It is also obvious that the primary

    index was not chosen for data access or to

    facilitate direct merge joins. Some queries

    analyze all calls from a particular phone

    number. Other queries analyze all calls for

    a particular period of time, perhaps for as

    long as a month, for customers meeting

    certain criteria. A non-PPI definition of

    this table, showing only a few critical

    columns, follows:

    CREATE TABLE CallDetail (

    phone_number DECIMAL(10) NOT NULL,

    call_start TIMESTAMP,

    call_duration INTEGER,

    other_columns CHAR(30))

    PRIMARY INDEX (phone_number,call_start);

    One possibility for partitioning this table

    would be to cast call_start as a date and

    partition by date, similarly to the solution

    in the first example. This would help with

    inserting new activity in the same manner

    as in the previous example. Deletion of

    rows would not get the same performance

    gain since the deletes are not strictly by

    call date and, therefore, the deleted rows

    would not be clustered in a partition. In

    this case, the ALTER TABLE statement

    could not be used, and the process would

    not reap the same performance benefit

    that deleting entire partitions provides.

    The analysis queries that are based on the

    date of the call would benefit with queries

    specifying a range of a few days getting the

    greatest gain.

    Another choice would be to use the phone

    number as the partitioning column. Phone

    numbers contain too many digits to give

    each number its own partition, but a

    subset of the digits could be used. If the

    first (high-order) three digits are used,

    there would be 1000 partitions, some of

    which would always be empty because of

    the way phone numbers are assigned.

    This partitioning expression would not

    improve the performance of bulk inserts

    or deletes, which would be scattered across

    all partitions. It would not help with date-

    based queries, but would allow queries

    specifying a phone number to run much

    faster as only one partition would be read

    out of maybe 500 or more non-empty

    partitions. A second advantage would be

    to benefit geographic area analysis since

    (in some parts of the world, at least) the

    first three digits identify a particular area.

    If 1000 partitions improve performance,

    10,000 partitions (the first four digits of

    the phone number) would probably be

    even better. If 10,000 partitions were good,

    maybe 50,000 would be better yet. We

    cannot have 100,000 partitions, but we

    could use the first five digits and assign

    two consecutive numbers to each parti-

    tion. Some partitions might be empty due

    to the way phone numbers are assigned.

    For example, this table definition creates

    50,000 partitions using the first five digits

    of the phone number:

    CREATE TABLE PPI_CallDetail (

    phone_number DECIMAL(10) NOT NULL,

    call_start TIMESTAMP,

    call_duration INTEGER,

    other_columns CHAR(30))

    PRIMARY INDEX (phone_number,call_start)

    PARTITION BY RANGE_N (

    CAST(phone_number / 100000.00000AS INTEGER) BETWEEN 0 AND 99999EACH 2);

    If its not important to be able to map a

    geographic area to one or more partitions,

    another option would be to maximize the

    number of partitions by using the parti-

    tioning expression (phone_number mod

    65535) + 1. If the table contains about

    3.276 billion rows, on average each

    Partitioned Primary Indexes

    EB-1889 > 1204 > PAGE 9 OF 14

  • partition would contain about 50,000

    rows. For a system with 100 AMPs, each

    AMP would on average contain about

    500 rows per partition, a number of rows

    that might fit in one data block if the row

    width was fairly small. The decrease in

    response time of a one-partition scan for

    all activity for a particular phone number

    would be dramatic compared to the full-

    table scan that would result with the

    non-PPI table. A query to return activity

    for one phone number is a best-case

    scenario for single-table response time

    improvement due to PPI. Disregarding the

    overhead cost of initiating the query and

    returning the answer set, the elapsed time

    could be reduced to 1/65535 of the time

    using the non-PPI table. Including the

    query initiation and termination overhead,

    the total query time improvement would

    be somewhat less than a factor of 65,535,

    but could be less than 1/10000 of the

    non-PPI time. Here is a table definition to

    use this partitioning:

    CREATE TABLE PPI_CallDetail (

    phone_number DECIMAL(10) NOT NULL,

    call_start TIMESTAMP,

    call_duration INTEGER,

    other_columns CHAR(30))

    PRIMARY INDEX (phone_number,call_start)

    PARTITION BY phone_number MOD 65535 + 1;

    The best choice, if any, of these proposed

    partitioning expressions depends on the

    mix of anticipated queries. The extended

    logical data model can serve as the starting

    point for making the decision, but some

    amount of testing of different scenarios

    will often be required.

    A Final Example

    The previous examples illustrate scenarios

    where a PPI table is the correct choice. For

    this example, we examine a more ambigu-

    ous situation in which more trade-off

    considerations apply, and the correct

    solution is not as evident.

    An invoice table contains data about

    each invoice issued in the past four years.

    The unique primary index is invoice

    number. New rows are added nightly using

    Teradata MultiLoad, and the oldest month

    of data is deleted once per month. There

    is a moderately heavy volume of queries

    that get information about one specified

    invoice. There are ad hoc analysis queries

    that examine all invoices for some period

    of time, usually less than a year. Other

    tables have invoice number as their

    primary index, but do not have an invoice

    date column. There are frequent joins with

    those other tables.

    The DBA is considering whether it would

    be advantageous to partition the invoice

    table on invoice date using one-month

    ranges.

    The following are some of the considera-

    tions that will apply:

    Additional Disk Space

    Required

    The primary index is currently defined as

    unique, but would have to be defined as

    non-unique if the table was partitioned.

    There is a business requirement to guaran-

    tee that invoice numbers are unique.

    Therefore, the DBA would have to define

    a unique secondary index on the invoice

    number column. This secondary index

    would increase processing times on insert,

    delete, and update operations, and con-

    sume additional disk space. The base table

    would also be larger, by two bytes per row,

    further increasing the required disk space.

    Slower Short-Running Queries

    PI access queries would now use the

    unique secondary index to access the

    row. As a rule of thumb, accessing the

    row using a secondary index would take

    roughly two to three times as long as using

    the primary index for the non-PPI table.

    On a positive note, the PI access is a very

    fast, usually a sub-second, operation.

    Doubling or tripling the response time is

    likely to go unnoticed to the users who

    issue those queries.

    Slower Long-Running Queries

    Direct merge joins (without partition

    elimination) would at best require more

    memory and CPU time, and may be

    measurably slower compared to a similar

    Partitioned Primary Indexes

    EB-1889 > 1204 > PAGE 10 OF 14

  • Partitioned Primary Indexes

    EB-1889 > 1204 > PAGE 11 OF 14

    non-PPI table. The amount of perform-

    ance degradation will depend on the query

    conditions, how many partitions can be

    eliminated, and the specific join plan

    chosen by the Optimizer. Actual measure-

    ment of representative queries will be

    required to determine the overall differ-

    ence in performance.

    Impact on Table Maintenance

    Nightly inserts would benefit in the same

    way as in the first example for the same

    reasons. However, the additional index on

    invoice number would partially offset the

    benefit. Since Teradata MultiLoad does not

    support unique secondary indexes, the

    index would need to be dropped prior to

    the MultiLoad job and then recreated after

    the job. Alternatively, this may be an

    opportunity to move to a near-real-time

    load strategy using, for example, Teradata

    TPump.

    The same considerations as in the first

    example apply to the monthly deletes.

    Similarly, in Teradata Database V2R6,

    benefits may occur with archives and

    restores of selected partitions.

    Faster Ad hoc Queries

    Ad hoc queries examining several months

    of invoices would benefit in the same

    way as in the first example. The benefit

    would be greatest when fewer months are

    examined.

    Would it be worthwhile to convert the

    invoice table to use a PPI? The DBA will

    need to measure the amount of improve-

    ment and degradation in the various types

    of queries, and determine how much each

    query type contributes to the overall

    workload involving this table. This will

    provide an estimate of the overall work-

    load performance with and without a PPI

    table. If the difference between a PPI and

    non-PPI table performance is substantial

    in either direction, the choice will be

    evident for the overall workload. But the

    DBA should also consider the relative

    importance of the various activities. For

    example, if the nightly insert volume is

    starting to overwhelm the time set aside

    for inserting new activity, even a small

    improvement in load time might be

    considered sufficiently important to offset

    larger degradations in queries. Similarly, if

    the response time of PI queries is critical,

    even a small degradation in those queries

    might be considered unacceptable even if

    overall workload performance is improved.

    In short, measurement and analysis is

    required to come to a rational decision

    for this case.

    Specifics of Defining a PPI Table

    The PRIMARY INDEX clause of the

    CREATE TABLE statement may be

    followed by an optional PARTITION BY

    partitioning_expression clause. The parti-

    tioning expression is a general expression

    allowing wide flexibility in tailoring the

    partitioning expression to the unique

    characteristics of the table. Two functions,

    RANGE_N and CASE_N, are provided

    to simplify the creation of partitioning

    expressions.

    One or more columns can make up the

    partitioning expression although its

    anticipated that, for most tables, one

    column will be specified. The partitioning

    columns can be part of the primary index,

    but are not required to be. The result of

    the partitioning expression must be a

    scalar value that is INTEGER or can be

    cast to INTEGER. Most deterministic

    functions can be used within the expres-

    sion. The expression must not require

    character or graphic comparisons,

    although character or graphic columns

    can be referenced in some circumstances.

    If the partitioning columns are not all part

    of the primary index, the primary index

    cannot be defined as unique although a

    unique secondary index can be defined on

    the same columns as the primary index.

    Only base tables can be PPI tables. This

    excludes global temporary tables, volatile

    tables, join indexes, hash indexes, and

    secondary indexes. This restriction does

    not mean that a PPI table cannot have

    secondary indexes or cannot be referenced

    in the definition of a join index or hash

    index. It merely means that the PARTI-

    TION BY clause is not available on a

    CREATE GLOBAL TEMPORARY TABLE,

    CREATE VOLATILE TABLE, CREATE

    INDEX, CREATE JOIN INDEX, or

    CREATE HASH INDEX statement.

    In the general case, there can be up to

    65,535 partitions numbered from one.

    As rows are inserted into the table, the

    partitioning expression is evaluated to

    determine the proper partition placement

  • for that row. A two-byte internal represen-

    tation of the partition number is

    embedded in the row as part of the row

    identifier making PPI rows two bytes

    wider than they would be if the table

    wasnt partitioned. Secondary indexes

    referencing PPI tables use the wider row

    identifier, making those rows wider as

    well.1 Except for the embedded internal

    partition number, PPI rows have the same

    format as non-PPI rows. A data block can

    contain rows from multiple consecutive

    partitions. There are no new control

    structures to implement the partitioning

    expression.

    Sample uses of partitioning expressions

    were shown in the discussions of the

    examples that were presented earlier. While

    the examples were simple, the partitioning

    expression is a general expression, which

    makes it possible to define complex

    partitioning schemes tailored to the

    processing needs of individual tables.

    However, a simple partitioning expression

    (for instance, RANGE_N on a single date

    column) may provide the best opportuni-

    ties for partition elimination in queries.

    The Optimizer does partition elimination

    for a query by analyzing the constraints on

    the partitioning columns in the context of

    the partitioning expression. Constraints

    that compare the partitioning columns to

    be equal to constant expressions provide

    partition elimination. Also, range con-

    straints on the partitioning column where

    the partitioning column is compared to

    constant expressions and the partitioning

    expression is a single column or a

    RANGE_N function on a single column

    can provide partition elimination. In some

    cases, the constant expressions may

    contain USING variables and still provide

    partition elimination.

    Joins on the primary index columns of a

    partitioned table that are equated to the

    columns of another table are also opti-

    mized when there are a small number of

    non-eliminated partitions. In this case,

    a set of partitions can be directly read in

    a sliding window of merge joins and,

    thereby, avoid spooling the partitioned

    table prior to the join. If also joined by

    equality on the partitioning columns, a

    rowkey merge join simplifies and improves

    the performance of the merge join.

    In Teradata Database V2R5.1, dynamic

    partition elimination can occur when

    there is an equality constraint between

    the partitioning column of one table and

    a column of another table. This is useful

    when looking up a row in one table and

    matching those rows to corresponding

    partitions (using a product join) instead of

    a product join to the entire table. Teradata

    Database V2R6 further extends dynamic

    partition elimination to merge joins.

    Another enhancement in Teradata Data-

    base V2R6 provides partition elimination

    on the referencing rowids of a secondary

    index. Instead of looking up all the rows

    in the base table for particular index value,

    only rows in the base table referenced by

    rowids pointing to non-eliminated

    partitions are read.

    Teradata Database V2R6 also makes a

    Non-Unique Secondary Index (NUSI)

    access a single-AMP operation if the

    NUSI is on the same columns as the

    Non-Unique Primary Index (NUPI) with

    an equality condition on the NUSI. Note

    that a NUSI on the same columns as the

    NUPI is only allowed for a PPI table. This

    potentially provides a faster access path

    than using the NUPI but with the same

    single-AMP and rowhash locking charac-

    teristics. This can occur when the number

    of occurrences of a NUSI value is less than

    the number of partitions.

    As mentioned earlier, Teradata Database

    V2R6 provides for archives and restores of

    selected partitions.

    The ALTER TABLE statement has been

    extended to support PPI. An example was

    shown in the section How PPI Solves the

    Business Problem the First Example to

    drop the partition containing the oldest

    transactions and create expansion parti-

    tions for future dates. This is a simple

    example, but it does illustrate the capability.

    The ability to ALTER a PPI table provides

    a simple and convenient mechanism for

    the DBA to perform periodic maintenance

    on a range-based PPI table.

    Partitioned Primary Indexes

    EB-1889 > 1204 > PAGE 12 OF 14

    1 A join index or hash index that references a table using a row identier uses the wider format whetheror not the table has a partitioned primary index starting with Teradata Database V2R5.

  • High-Level PartitioningGuidelines

    Here are some general guidelines, with

    limited discussion, to help determine

    whether and how to partition a table:

    1. Large tables are good candidates for

    partitioning.

    2. Partition on a column that is fre-

    quently used as a restrictive query

    condition.

    3. If other factors are equal, partition on

    a column that is part of the primary

    index in preference to a column that

    is not, unless the primary index is

    seldom, if ever, used for access or joins.

    4. While there are few restrictions on the

    partitioning expression, a partitioning

    expression is only useful if the Opti-

    mizer can effectively apply partition

    elimination to queries. A simple

    partitioning expression is more likely

    to give the maximum amount of

    partition elimination than a more

    complex expression. For example, a

    RANGE_N function on a date column

    can often be an effective partitioning

    expression for queries with range

    constraints on the partitioning column.

    5. Use RANGE_N or CASE_N in prefer-

    ence to direct use of a column in

    most situations. The Optimizer can

    determine the maximum number

    of partitions when RANGE_N or

    CASE_N is used, and will have to

    assume 65,535 partitions otherwise.

    Join costing, in particular, can be more

    accurate when the actual number of

    partitions is known and fairly small

    than when the number is assumed to

    be 65,535.

    6. Unless the PI is rarely used for access

    or direct merge joins, keep the number

    of partitions fairly small when the

    partitioning expression uses columns

    that are not part of the PI.

    7. The same considerations regarding

    the selection of the primary index

    apply to PPI tables as non-PPI tables.

    Choose PI columns that provide good

    distribution and avoid large clumps of

    duplicate PI values, and which are most

    commonly used to access individual

    rows in the table. Sometimes those two

    considerations conflict, and a reason-

    able compromise between the two

    must be reached.

    A more detailed description of partition-

    ing guidelines may be found in the

    Teradata Orange Book: Partitioned Primary

    Index Usage.

    High-Level Trade-offConsiderations

    The greatest potential gain derived from

    partitioning a table is the ability to read

    a small subset of the table instead of the

    entire table. For example, a query that

    examines two months of sales from a table

    with two years of sales history would read

    about one-twelfth of the table instead of

    all of it. This can provide a large perform-

    ance boost for a wide range of queries, day

    after day, and is automatic. SQL authors

    need not be aware of the partitioning

    structure, and no changes are required to

    existing SQL.

    A second potential advantage is faster

    batch loads. If the table is partitioned

    by transaction date, nightly loads of

    transactions for the current day can be

    dramatically improved. Similarly, the time

    to delete old rows no longer needed can be

    dramatically faster (nearly instantaneous

    in some cases) when the table is parti-

    tioned by transaction date.

    Finally with Teradata Database V2R6,

    you can perform archives and restores of

    selected partitions. This allows for more

    frequent, but less costly archives. For

    restores, critical data (for example, in the

    most recent partitions) can be restored

    quickly and made available to users

    without waiting for the entire table to

    be restored.

    In the above situations, the improvement

    may be even greater when the partitioning

    structure makes one or more secondary

    indexes or join indexes redundant, allow-

    ing those indexes to be dropped.

    Offsetting these gains are some potential

    disadvantages of partitioning. The first

    disadvantage is that PI access of the table

    may be slower when a partitioning column

    is not part of the PI. This disadvantage can

    Partitioned Primary Indexes

    EB-1889 > 1204 > PAGE 13 OF 14

  • be offset by choosing partitioning columns

    that are part of the PI, specifying the

    values of the partitioning columns and

    the PI columns, or, in some situations, by

    defining a secondary index.

    A second disadvantage is that direct merge

    joins involving a partitioned table may be

    slower unless both tables can be identically

    partitioned. The disadvantage can be offset

    when the query conditions allow some

    partitions to be eliminated from the join.

    As in all physical design choices, you must

    weigh the trade-off considerations and test

    assumptions to get the best results.

    A more detailed description of trade-off

    considerations may be found in the

    Teradata Orange Book: Partitioned Primary

    Index Usage.

    Summary

    PPI tables can dramatically improve

    performance of certain types of queries,

    especially those that access only a small

    part of a large table. High-volume data

    load and data maintenance times can

    also be improved when, for example,

    the transaction date is specified as the

    partitioning column.

    A partitioned primary index is flexible and

    easy to use. PPI tables retain the traditional

    uses of primary indexes to distribute data

    evenly and provide very fast access when

    the primary index value is specified in

    the query.

    No changes to existing SQL are necessary.

    Users accessing a PPI table will see no

    difference, except perhaps for different

    average response times.

    Whether and how to partition the primary

    index of a table is a physical design choice.

    The trade-off considerations associated

    with a PPI should be understood and

    considered when making the physical

    design decisions.

    The extended logical data model can serve

    as the starting point for making physical

    design decisions, but some amount of

    testing of different scenarios will often be

    required. As with other physical design

    decisions, the total workload and relative

    importance of the workload components

    must be examined to determine whether

    the benefits will outweigh the disadvan-

    tages for each design decision.

    Partitioned Primary Indexes

    EB-1889 > 1204 > PAGE 14 OF 14

    Teradata and NCR are registered trademarks of NCR Corporation. NCR continually enhances products as new technologies and components become available. NCR, therefore, reserves the right to change specications without prior notice. All features, functions, and operations described herein may not be marketed in all parts of the world. Consult your Teradata representative or visit Teradata.com for more information. No part of this publication may be reprinted or otherwisereproduced without permission from Teradata.

    This document, which includes the information contained herein, is the exclusive property of NCR Corporation. Any person is hereby authorized to view, copy, print,and distribute this document subject to the following conditions. This document may be used for non-commercial, informational purposes only and is provided onan AS-IS basis. Any copy of this document or portion thereof must include this copyright notice and all other restrictive legends appearing in this document. Note that any product, process or technology described in the document may be the subject of other intellectual property rights reserved by NCR and are notlicensed hereunder. No license rights will be implied. Use, duplication or disclosure by the United States government is subject to the restrictions set forth in DFARS252.227-7013 (c) (1) (ii) and FAR 52.227-19.

    2004 NCR Corporation Dayton, OH U.S.A. Produced in U.S.A. All Rights Reserved.

    Teradata.com