ibm db2 for linux, unix, and windows best practices physical

IBM® DB2® for Linux®, UNIX®, and Windows®

Best Practices Physical Database Design

Sam Lightstone

Program Director and Senior Technical

Staff Member

Information Management Software

Christopher Tsounis

Executive IT Specialist

Information Management Technical Sales

Agatha Colangelo

DB2 Information Development

Steven Tsounis

IT Specialist

Information Management Technical Sales

IBM®

Physical Database Design

Physical Database Design ................................................................................... 1

Executive summary ............................................................................................. 4

Introduction to physical database design......................................................... 6

Assumptions about the reader..................................................................... 7

Goals of physical database design..................................................................... 8

Datatype selection best practices ....................................................................... 9

Example of virtual views that represent a lookup table for each column

........................................................................................................................... 9

Table normalization and denormalization best practices ............................ 12

Normalization............................................................................................... 12

Third normal form (3NF)..............................................................................................12 1NF, 2NF, and 3NF of database design ......................................................................13 Star schema and snowflake models ............................................................................15

Denormalization........................................................................................... 15

IBM Layered Data Architecture................................................................. 15

Index design best practices ............................................................................... 18

Clustering indexes ....................................................................................... 18

Data clustering and multidimensional clustering (MDC) best practices... 23

Block indexes for MDC tables .................................................................... 23

Maintaining clustering automatically during INSERT operations....... 25

Benefits of using MDC ................................................................................ 26

MDC storage scenario ................................................................................. 29

MDC run time overhead and benefit considerations ............................. 30

Determining when to use MDC versus a clustering index.................... 30

Database partitioning (shared-nothing hash partitioning) best practices . 34

Balanced Warehouse and Balanced Configuration Units (BCU).......... 35

Table (range) partitioning best practices ........................................................ 39

UNION All View (UAV) partitioning best practices.................................... 42


Migrating UAVs to table partitioning....................................................... 43

Database partitioning, table partitioning, and MDC in the same database

design best practices .......................................................................................... 45

Roll-in and roll-out of data with table partitioning and MDC best practices

............................................................................................................................... 46

Rolling-in large data volumes using table partitioning best practices....... 47

Materialized query table (MQT) best practices ............................................. 48

Post-design tools for improving designs for existing databases................. 51

Explain facility best practices ..................................................................... 51

DB2 Design Advisor best practices ........................................................... 52

MDC selection capability of the DB2 Design Advisor..............................................53 Best Practices....................................................................................................... 55

Conclusion .......................................................................................................... 58

Further reading................................................................................................... 59

Contributors.................................................................................................. 60

Notices ................................................................................................................. 61

Trademarks ................................................................................................... 62


Executive summary Physical database design is the single most important factor that impacts database

performance. Physical database design covers all of the design features that relate to the

physical structure of the database such as datatype selection, table normalization and

denormalization, indexes, materialized views, data clustering, multidimensional data

clustering, table (range) partitioning, and database (hash) partitioning.

Good physical database design reduces hardware resource utilization (I/O, CPU, and

network) and improves your administrative efficiency. This, in turn, can help you

achieve the following potential benefits to your business:

• Increased performance of applications that use the database, resulting in better

response times and higher end-user satisfaction

• Reduced IT administrative costs, giving you the ability to manage a wider scope

of databases and respond quicker to changes in application requirements

• Reduced IT hardware costs

• Improved backup and recovery elapsed time

Figure 1 shows an illustration of a physical database system. The three heavy dark-boxed

vertical rectangles indicate three distinct database instances. All other square or

rectangular boxes represent storage blocks on disk. All symbols represent data values

within the table (such as geography or month).

In this example, a table has been hash-partitioned across three instances called P1, P2,

and P3. The table has been range-partitioned by month, allowing data to be easily added

and deleted by month. Indirectly, this also helps with queries that have predicates by

month. Data within each table has been clustered using multidimensional clustering

(MDC), and this serves as a further clustering within each range partition. The rows

within the table are also indexed using regular row-based (RID-based) indexes. A

materialized query table (MQT) is created on the table, which includes aggregated data

(such as average sales by geography), which itself has indexing and MDC.


Figure 1 Illustration of a physical database system


Introduction to physical database design Database design is performed in three stages:

1. Logical database design: includes gathering of business requirements, and entity

relationship modeling.

2. Conversion of the logical design into table definitions (often performed by an

application developer): includes pre-deployment design, table definitions,

normalization, PK and FK relationships, and basic indexing.

3. Post deployment physical database design (often performed by a database

administrator): includes improving performance, reducing I/O, and streamlining

administration tasks.

Physical database design covers those aspects of database design that impact the actual

structure of the database on disk, items 21 and 3 in the list above. Although you can

perform logical design independently of the platform that the database will eventually

use, many physical database attributes depend on the specifics and semantics of the

target DBMS. Physical database design includes the following attributes:

• Datatype selection

• Table normalization

• Table denormalization

• Indexing

• Clustering

• MDC

• Database partitioning

• Range partitioning

• UAV partitioning

• MQTs

• Memory allocation

• Database storage topology

• Database storage object allocation

This paper covers all but “Database storage topology” and “Database storage object

allocation,” which are covered in “Best Practices: Database Storage” white paper. This

1 This phase is variably referred to in the industry as logical database design or physical database design. It’s known as logical database design in the sense that it can be designed independent of the data server or the particular DBMS used. It is also often performed by the same people who perform the early requirements building and entity relationship modeling. Conversely, it is also called physical database design in the sense that it affects the physical structure of the database and its implementation. For the sake of this document we use the latter assumption, and therefore include it as part of physical database design.


white paper and others mentioned throughout this paper are available at the DB2 Best

Practices website at http://www.ibm.com/developerworks/db2/bestpractices/.

Physical database design is as old as databases themselves2. The first relational databases

were prototypes (in the early 1970s). As relational database systems advanced, new

techniques were introduced to help improve operational efficiency. The most elementary

problems of database design are table normalization and index selection, both of which

are discussed below.

Today, we can achieve I/O reductions by properly partitioning data, distributing data,

and improving the indexing of data. All of these innovations (which improve database

capabilities, expand the scope of physical database design, and increase the number of

design choices) have resulted in the increased complexity of optimizing database

structures. Although the 1980s and 1990s were dominated by the introduction of new

physical database design capabilities, the years since have been dominated by efforts to

simplify the process through automation and best practices.

The vast majority of physical database design features and attributes have the primary

goal of reducing I/O use at run time. However, to a lesser degree, there are “physical

design aspects” that help improve administrative efficiency and reduce CPU or network

use. In addition, in the DB2 partitioned environment, the database design influences the

degree of parallel processing, for example, parallel query processing.

The best practices presented in this document have been developed with the reality of

today’s database systems in mind and specifically address the features and facilities

available in DB2 9.5.

Assumptions about the reader It is assumed that you are familiar with the physical database design features described.

Therefore, only a very brief description of each one is provided. The focus of this paper is

on the best practices for applying these features. For details on each respective feature,

refer to the DB2 product documentation.

2 The relational model for databases was first proposed in 1970 by E.F Codd at IBM. The first relational database systems to be implemented, using SQL and B+ tree, were IBM’s System R, in 1976, and Ingres at the University of California, Berkeley. The B+ tree, the most commonly used indexing storage structure for user-designed indexes, was first described in the paper “Organization and Maintenance of Large Ordered Indices” by Rudolf Bayer and Edward M. McCreight, 1972.


Goals of physical database design A high-quality physical database design is one that meets the following goals:

• Minimizes I/O

• Balances design features that optimize query performance concurrently with

transaction performance and maintenance operations

• Improves the efficiency of database management, such as roll-in and roll-out of

data

• Improves the performance of administration tasks, such as index creation or

backup and recovery processing

• Minimizes backup and recovery elapsed time


Datatype selection best practices

When designing a physical database, the selection of appropriate datatypes is an

important consideration that should not be overlooked.

Often, abbreviated or intuitive codes are used to represent a longer value in columns, or

to easily identify what the code represents; for example, an account status column whose

codes are OPN, CLS, and INA (representing an account that can be open, closed, or

inactive).

From a query processing perspective, numeric values can be processed more efficiently

than character values, especially when joining values. Therefore, using a numeric

datatype can provide a slight benefit.

While using numeric datatypes might mean that interpreting the values that are being

stored in a column is more difficult, there are appropriate places where the definitions of

numeric values can be stored for retrieval by end users, such as:

o Storing the definitions as a domain value in a data modeling tool such as

Rational Data Architect, where the values can be published to a larger team

using metadata reporting

o Storing the definition of the values in a table in a database, where the definitions

can be joined to the values to provide context, such as text name or description

(tables that store values of columns and their descriptions are often referred to as

reference tables or lookup tables)

Another concern that is often raised is that, for a large databases, this storing of

definitions could lead to the proliferation of reference tables. While this is true, if an

organization chooses to use a reference table for each column that is used to store a code

value, it is possible to consolidate these reference tables into either a single or a few

reference tables. From these consolidated reference tables, virtual views can be created to

represent the lookup table for each column.

Example of virtual views that represent a lookup table for each

column In the following diagram, the TCUSTOMER table has two columns that use code values:

CUST_TYPE and CUST_MKT_SEG. In this scenario, a reference table is created for each

column that uses a code, resulting in two reference tables, TCUST_TYPE_REF and

TCUST_MKT_SEG_REF.


This approach is not flexible because any time a new column is added that employs the

use of a code value, a new reference table must be created.

A possible solution is to consolidate the reference table into a single reference table

(TREF_MASTER), as shown in the following diagram:

In this diagram, two virtual views, VCUST_TYPE_REF and VCUST_MKT_SEG_REF,

were created from the TREF_MASTER table to represent the reference tables in the

example above.

The benefit to this approach is that end users can still use the reference table (without

having to write complex SQL) by simply accessing the reference views for each column.

In addition, the DBA will only maintain a single table for all of the reference data, and the

proliferation of reference tables is limited.


To understand how the VCUST_TYPE_REF view was created, here is the SQL:

SELECT VALUE as CUST_TYPE, VALUE_NME as CUST_TYPE_NME, VALUE_DESC as CUST_TYPE_DESC FROM REFTB.TREE_MASTER WHERE TBL_SCHEMA = ‘REFTB’ AND TABLE = ‘REF_MASTER’ AND COLUMN = ‘CUST_TYPE’

Use the following best practices when selecting datatypes:

� Always try to use a numeric datatype over a character datatype, taking the

following considerations into account:

o When creating a column that will hold a Boolean value (“YES” or “NO”),

use a decimal (1,0) or similar datatype. Use 0 and 1 as values for the

column rather than “N” or “Y”.

o Use integers to represent codes.

o If there will be less than 10 code values for a given column, decimal (1,0)

datatype is appropriate. If there are more than 9 code values that will be

stored in a given column, use smallint.

� Store the definitions as a domain value in a data modeling tool, such as Rational

Data Architect, where the values can be published to a larger team using

metadata reporting.

� Store the definition of the values in a table in a database, where the definitions

can be joined to the value to provide context, such as “text name” or

“description”.


Table normalization and denormalization best

practices Table normalization is the restructuring of a data model by reducing its relations to their

simplest forms. It is a key step in the task of building a logical relational database design.

Normalization helps avoid redundancies and inconsistencies in data; it is typically a

logical data modeling exercise, whose outcome might be implemented in the physical

design.

There are a few goals for deploying a normalized design:

• Eliminate redundant data, for example, storing the same data in more than one

table.

• Enforce valid data dependencies by only storing related data in a table, and

dividing relational data into multiple related tables.

• Maximize the flexibility of the system for future growth in data structures.

Normalization The two or three dominant strategies for normalization are:

• Third normal form (3NF), which is used in online transaction processing (OLTP)

and many general-purpose databases, including enterprise data warehouses

(also called atomic warehouses).

• Star schema and snowflake, which are dimensional model forms for normalization,

and are used heavily in data warehousing and OLAP.

Specify non-enforced RI on FK columns to reduce table access for STAR JOINs without

incurring the overhead of RI.

Third normal form (3NF)

3NF is a combination of the rules from first normal form and second normal form. The

following rules are specific to 3NF:

• Eliminate repeating groups. Make a separate table for each set of related

attributes, and give each table a PK.

• Eliminate duplicate columns and redundant data in each table.

• Move subsets of columnar data that apply to multiple rows of a table into

separate tables.

• Create relationships between the tables by using FKs.


• Eliminate columns not dependent on keys. If attributes do not contribute to a

description of a key, move them into a separate table.

• Remove columns not dependent upon the PK.

1NF, 2NF, and 3NF of database design

The following diagrams demonstrate the first, second, and third normal forms of

database design:

Denormalized model:

First normal form (1NF):

To make the denormalized model comply with 1NF, the repeating group of data

elements, the customer address lines, and the customer names were normalized into

separate tables.


Second normal form (2NF):

For the model to comply with 2NF, it must comply with 1NF and any attributes must be

fully dependent on a part of a composite key.

Third normal form (3NF):

For the model to comply with 3NF, any transitive dependencies must be eliminated.

Transitive dependencies occur when a value in a non-key field is determined by the

value of another non-key field that is not part of a candidate key.


Star schema and snowflake models

The star schema and snowflake models have become quite popular for data warehousing BI

systems. The basis of star schema is the separation of the facts of a system from its

dimensions. Dimensions are defined as attributes of the data, such as the location, or

customer name, or part description, and the facts refer to the time-specific events related

to the data.

For example, a part description does not typically change over time, so it can be designed

as a dimension. Conversely the number of parts sold daily varies over time and is

therefore a fact. A star schema is called that because it is typically characterized by a

large central fact table that holds information about events that vary over time,

surrounded (conceptually) by a set of dimension tables holding the meta attributes of

items that are referenced within the fact events.

A snowflake is basically an extension of a star schema. In a snowflake design, the low

cardinality attributes are often moved from a dimension table in a star schema into

another dimension table and then a relationship is created between the two dimension

tables.

Denormalization In contrast to normalization, denormalization is the process of collapsing tables and,

therefore possibly increasing the redundancy of data within a database. Denormalization

can be useful in reducing the complexity or number of joins, and reducing the complexity

of a database by reducing the number of tables. The primary goal of denormalization is

to maximize performance of a system and reduce the complexity in administering the

system.

IBM Layered Data Architecture IBM Layered Data Architecture offers multiple levels of granularity. Each layer provides

a different level of detail and data summarization appropriate to user needs, which users

(analysts and executives) can access. As data ages, it rolls up through the layers (with

more tables and less data per table). This architecture is designed specifically for mixed

workloads, query performance, rapid incorporation of new data sources, and

deployment of new applications.

The layered architecture enables concurrent loading, query, archive and maintenance

without compromising query performance. The multiple levels of data granularity are

available for multiple types of analytics.

Figure 2 shows the 5 layers (or floors) of the IBM Layered Data Architecture.


Figure 2 IBM Layered Data Architecture

With this model, warehouse administrators can:

1. Use visual modeling tools to optimize the design of multilayered warehouse

schemas.

2. Use their preferred extract, transform, and load (ETL) software to bulk-load the

staging layer of the warehouse—with scale, speed and rich transformations from

myriad enterprise data sources.

3. Use SQL Warehousing Tools (SQWs) to maintain analytic structures in the

performance and business access layers—or to replace hand-coded SQL flows

anywhere inside the warehouse.

This layered architecture is a powerful paradigm that is too detailed to describe at length

here. Refer to “Best Practices for Creating Scalable High Quality Data Warehouses with

DB2” in the “Further reading” section for detailed information on this layered

architecture.

Use the following normalization and denormalization best practices:


• Use 3NF whenever possible for most OLTP and general-purpose database

designs to maintain flexibility in the design of the system. It is a tried-and-true

normalization model.

• For data warehouses and data marts that require very high performance, a star

schema or snowflake model is typically optimal for dimensional query

processing. However, verify that the star schema or snowflake model conforms

to the relationships that you designed in the normalized logical data model.

More information about logical modeling for users of Rational Data Architect is

available in “Best Practices: Data Life Cycle Management” white paper.

• For broad-based data warehousing that is used for several purposes, such as

operational data stores, reporting, OLAP and cubing, use the IBM Layered Data

Architecture illustrated in Figure 2.

• Consider denormalizing very narrow tables, ones with a row length of 30 or

fewer bytes. Extra tables in a database increase query complexity and complicate

administration.


Index design best practices Indexes are critical for performance. They are used by a database for the following

purposes:

• Apply predicates to provide rapid look up of the location of data in a database,

reducing the number of rows navigated

• To avoid sorts for ORDER BY and GROUP BY clauses

• To induce order for joins

• To provide index-only access, which avoids the cost of accessing data pages

• As the only way to enforce uniqueness in a relational database

However, indexes incur additional hardware resources:

• They add extra CPU and I/O cost to UPDATE, INSERT, DELETE, and LOAD

operations

• They add to prepare time because they provide more choices for the optimizer

• They can use a significant amount of disk storage

In DB2 database systems, a B+ tree structure is used as the underlying implementation

for indexes. All data is stored in the leaf nodes, and the keys are optionally chained in a

bidirectional manner to allow both forward and backward index scanning. If DISALLOW

REVERSE SCANS is specified then the index cannot be scanned in reverse order.

Clustering indexes Clustering indexes (also called special indexes) indicate to the database manager that data in

the table object should be clustered in a specific order, on disk, according to the definition

of the index. For example, if the clustering index is defined on a date key, then the DB2

database manager will attempt to store, in the table object, rows with similar dates in

ascending date sequence.

The table in Figure 3 has two row-based indexes defined on it:

• A clustering index on Region

• Another index on Year


Figure 3. A regular table with a clustering index

The value of this clustering is that subsequent queries that have predicates on the

clustering attribute need to perform dramatically reduced I/O. For example, a query on

sales by date will perform far less I/O if the rows for the selected dates are stored next to

each other on disk.

However, clustering indexes are merely an indicator to the database, and as new rows

are inserted into the database the DB2 kernel attempts to place these rows near rows with

the same or similar attributes. If space is unavailable, the incoming or changed row might

be redirected to another location that is unclustered (that is, not near the related rows).

When an INSERT occurs (or an UPDATE to the clustering keys) the DB2 kernel

navigates, top down, scanning the clustering index to determine an appropriate location

for the row. Therefore, INSERT, and some UPDATE operations on a table with a

clustering index, incurs the overhead of index access that an unclustered table would not.

Techniques like “append on” (APPEND ON option on the CREATE and ALTER TABLE

statements) can minimize this overhead by placing all new rows at the end of the table.

Therefore, clustering indexes provide approximate clustering, and data often becomes

unclustered over time. The REORG utility can be used to reorganize the data rows back

into perfect cluster order, although, for online REORGs, this can be a time-consuming

and log-intensive operation.


To create clustering indexes, simply add the CLUSTER keyword on the create index

statement as shown in the following example, where a clustering index MyIndex will be

created on column C1 of table T1. There can be only one clustering index per table.

CREATE INDEX MyIndex on T1 (C1) CLUSTER

Because data clustering can deteriorate over time when using a clustering index,

clustering with MDC is preferred as a best practice as it guarantees clustering at all times,

and provides the option to clustering along multiple dimensions concurrently. See the

discussion on MDC for help on determining which method to use.

Utilize the following index design best practices:

• Index every PK and most FKs in a database. Most joins occur between PKs and

FKs, so it is important to build indexes on all PKs and FKs whenever possible.

Indexes on FKs also improve the performance of RI checking.

• Explicitly provide an index for the PK. The DB2 database manager indexes the

PK automatically with a system-generated name if one is not specified. The

system-generated name for an automatically-generated index is difficult to

administer.

• Columns frequently referenced in WHERE clauses are good candidates for an

index. An exception to this rule is when the predicate provides minimal filtering.

An example is an inequality such as WHERE cost <> 4 . Indexes are seldom

useful for inequalities because of the limited filtering provided.

• Specify indexes on columns used for equality and range queries.

• Create an index for each set of fact table columns that join to a dimension. These

columns do not have to be part of an explicit FK. Creating the index allows STAR

JOIN access to plans that use dynamic bitmap index ANDing. Consider creating

indexes on combinations of fact-table columns.

For example, if PRODKEY and STOREKEY join to the product and store the

dimension respectively, consider creating an index on (PRODKEY, STOREKEY).

This facilitates a hub or cartesian STAR JOIN access plan.

• Use the db2pd command, which indicates the number of times that indexes were

used in order from highest to lowest. This can be helpful in detecting which

indexes are commonly used. For example:

db2pd -db MY_DATABASE -tcbstats index

The indexes are referenced using the IID, which can be linked with

SYSIBM.SYSINDEXES's IID for the index. At the end of the output (shown below


in two sections) is a list of index statistics. “Scans” indicates read access on

each index, while the other indicators in the output provide insight on write and

update activity to the index.

Left side of report:

Right side of report:

• Use the DB2 Design Advisor to indicate which indexes are never accessed for a

specified workload and can therefore be dropped.

• Add indexes only when absolutely necessary. Remember that indexes

significantly impact INSERT, UPDATE, and DELETE performance, and they also

require storage.

• To reduce the need for frequent reorganization, when using a clustering index

specify an appropriate PCTFREE at index creation time to leave a percentage of

free space on each index leaf page as it is created. During future activity, rows

can be inserted into the index with less likelihood of causing index page splits.

Page splits cause index pages not to be contiguous or sequential, which in turn

results in decreased efficiency of index page prefetching.

Note: The PCTFREE specified when you create the relational index is retained

when the index is reorganized.

Dropping and recreating, or reorganizing, the relational index also creates a new

set of pages that are roughly contiguous and sequential and improves index page

prefetch. Although more costly in time and resources, the REORG TABLE utility

also ensures clustering of the data pages. Clustering has greater benefit for index

scans that access a significant number of data pages.

• Examine queries with range or with ORDER BY clauses to identify clustering

dimensions.

• Clustering indexes incur additional overhead for INSERT and some UPDATE

operations. If your workload performs a large amount of updates, you will need

to weigh the benefits of clustering for queries against the additional cost to

INSERTS and UPDATES. In many cases, the benefit far outweighs the cost, but

not always.


• Avoid or remove redundant indexes. An example of a redundant index is one

that contains only an account number column when there is another index that

contains the same account number column as its first column. Indexes that use

the same or similar columns make query optimization more complicated, use

storage, seriously impact INSERT, UPDATE, and DELETE performance, and

often have very marginal benefits.

Although the DB2 database system provides dynamic bitmap indexing, index

ANDing, and index ORing, it is good practice to specify composite indexes,

referred to as multiple column indexes, if these columns are frequently specified in

WHERE clauses.

• Choose the leading columns of a composite index to facilitate matching index

scans. The leading columns should reflect columns frequently used in WHERE

clauses. The DB2 database system navigates only top down through a B-tree

index for the leading columns used in a WHERE clause, referred to as a matching

index scan. If the leading column of an index is not in a WHERE clause, the

optimizer might still use the index, but the optimizer is forced to use a non-

matching index scan across the entire index.


Data clustering and multidimensional clustering

(MDC) best practices MDC is a technique for clustering data along more than one dimension at the same time.

However, you can also use MDC for single-dimensional clustering, just as you can use a

clustering index. An advantage of an MDC table is that it is designed to always be

clustered. A reorganization is never required to re-establish a high-cluster ratio.

To understand MDC, you must first understand some basic terminology: Cells are the

portion of the table containing data having a unique set of dimension values—the

intersection formed by taking a slice from each dimension. Blocks are the unit of storage

equal to an extent size (one or more pages) that is used to store a cell. Your extent size

specification determines the size of the block (or cell).

Block indexes for MDC tables Unlike traditional indexes created by the CREATE INDEX syntax, which index each row

in a table, MDC indexes the rows in the table by block, called block indexes. MDC block

indexes are typically 1/1000th of the size of row-based indexes, and provide not only

huge savings in storage for the index, but massive efficiencies on all block index

operations (such as index scan, index ANDing, and index ORing). INSERT and UPDATE

operations are also enhanced because the block index is only updated if a new cell is

created.

As shown in Figure 4, block indexes provide a significant reduction in disk usage and

significantly faster data access:


Figure 4. How row indexes differ from block indexes

The MDC table shown in Figure 5 is physically organized such that rows having the

same Region and Year values are grouped together into separate blocks, or extents.

MDC block indexes are created for each dimension as well as the composite dimension.

For example, if the dimensions for a table are Region,Year then a block index is built

for Region , for Year, and for the composite dimension Region,Year.


Figure 5. A multidimensional clustering table (MQT)

An MDC table defined with even just a single dimension can benefit from these MDC

attributes, and can be a viable alternative to a regular table with a clustering index. This

decision should be based on many factors, including the queries that make up the

workload, and the nature and distribution of the data in the table. A high cardinality

column is not a good choice for a single-dimension MDC because you will get a cell for

each unique value.

Maintaining clustering automatically during INSERT

operations Automatic maintenance of data clustering in MDC tables is ensured using composite block

indexes3. These indexes are used to dynamically manage and maintain the physical

clustering of data along the dimensions of the table over the course of INSERT

operations. When an insert occurs, the composite block index is probed for the logical cell

corresponding to the dimension values of the row to be inserted. The block index is not

updated unless a new cell is created.

3 A composite block index is automatically created and contains all columns across all dimensions. It is used to maintain the clustering of data over insert and update activity, and might also be selected by the optimizer to efficiently access data that satisfies values from a subset, or from all, of the column dimensions.


As shown in Figure 6, if the key of the logical cell is found in the index, its list of block ID

(BIDs) gives the complete list of blocks in the table having the dimension values of the

local cell. This limits the number of extents of the table to search for space to insert the

row.

Figure 6. Composite block index on YearAndMonth , Region

Because clustering is automatically maintained, reorganization of an MDC table is never

needed to re-cluster data. Also, MDC can reuse empty cells that result from the mass

deletion of rows without a REORG. However, reorganization can still be used in rare

situations to reclaim space. For example, if cells have many sparse blocks where data

could fit on fewer blocks, or if the table has many pointer-overflow pairs, a

reorganization of the table would compact rows belonging to each logical cell into the

minimum number of blocks needed, as well as remove pointer-overflow pairs.

Benefits of using MDC The value of MDC is profound. It improves complex query performance by 10 times in

some cases and you can use it for roll-in and roll-out of data. Other benefits include the

following ones:

• MDCs are multi-dimensional. For example, data can be perfectly clustered along

DATE and LOCATION dimensions; cells and ranges are created automatically as

new data arrives.

• MDCs can be used in conjunction with normal RID-based indexes, range

partitioning, and MQTs. Index ANDing or ORing of block-based and RID-based

indexes is a possible access path that can be chosen by the DB2 Optimizer.

• MDCs are used with intra-query parallelism, DPF (shared nothing) parallelism,

and LOAD, BACKUP, and REORG operations.


• MDC dimensions, unlike range-partitioned tables, are dynamic; new cells get

created within the table automatically as unique new data representing new cells

arrives in the table either through SQL operations (including JDBC, CLI, and so

forth), or through utility operations such as LOAD and IMPORT. Empty cells can

also be reused during these operations.

• MDCs maintain clustering, and, as such, do not need REORGs to maintain

cluster ratios.

The following example shows how to define an MDC table:

CREATE TABLE T1 (c1 DATE, c2 INT, c3 INT, c4 DOUBLE, c5 INT generated always as (INT(C1)/100) ) ORGANIZE BY DIMENSIONS (c5, c3)

The ORGANIZE BY clause defines the clustering dimensions. The table is clustered by

C5 and C3 at the same time. C1 is coarsified4 to C5, which contains fewer distinct values

(days are reduced to months).

NOTE: The coarsified generated column(s) are used in the MDC block indexes to

perform cell-level elimination of data. Calculated columns are fully supported by MDC

and the DB2 Optimizer.

The key design challenge of MDC is the careful selection of the clustering dimensions. If

you choose clustering dimensions that result in too many cells, storage costs can increase

substantially. The reason for this is important to understand. In an MDC table, every cell

is allocated as many storage blocks on disk as required. Storage blocks are by design

equal to the extent size of the table space that holds a table. The number of storage blocks

is 0 if a cell has no data.

However, in a typical table a cell stores several rows, resulting in one or more storage

blocks being allocated to the cell. For every cell that has data, there is a chain of blocks,

which typically contains a partially filled block. Therefore, there could be wasted storage

for each cell (not each block), proportional to the size of the storage block. New blocks are

created only when the previous block is full (or nearly full). If rows are deleted and the

cell is empty, the database manager can reuse the space and avoid the need for a

reorganization (for space reclamation).

Storage blocks are by design equal to the extent size of the table space that holds a table.

If the number of cells in the table is very large, the storage waste is large. If MDC is poor

and results in a huge number of cells, the table storage requirement expands

dramatically, and MDC can also be a performance detriment. However, when designed

4 The term coarsification refers to a mathematics expression to reduce the cardinality (the number of distinct values) of a clustering dimension. A common example of a coarsification is the date where coarsification could be by date, week of the date, month of the date, or quarter of the year.


well, MDC tables are only slightly larger than non-MDC tables, and offer profound

benefits for clustering and roll-in and roll-out of data (as discussed in the paragraphs that

follow). The key is to use low-cardinality columns for the dimensions of an MDC.

Figure 7 shows storage block and cell allocation. As shown, each cell contains a set of

storage blocks. Most of the blocks are filled with data, but for each cell there is a block at

the end of the chain which is partially filled to a lesser or greater degree.

Figure 7 MDC storage by cell

If you have sample or actual data, using SQL, you can measure the number of expected

MDC cells for any given potential MDC design, as follows:

SELECT COUNT(*) FROM (SELECT DISTINCT COL1, COL2, C OL3 FROM MY_FAV_TABLE) AS NUM_DISTINCT;

COL1, COL2, and COL3 represent the MDC dimensions for a 3-dimensional MDC table.

The resulting number multiplied by the extent size of the table will give you an upper

bound on the extent growth (not size) of the table when converted to MDC.

As described in the previous section, another key value of MDC is that the DB2 database

manager automatically creates indexes for MDC tables over the MDC dimensions of the

table. These special indexes (call block indexes) index data by block instead of by row. This


results in associated run time performance benefits for queries and minimal overhead for

INSERT, UPDATE and DELETE operations.

MDC provides features that facilitate the roll-in and roll-out of data:

o MDC has much less block index I/O during the roll-in process because the block

index is only updated once when the block is full (not for every row inserted).

o Inserts are also faster because MDC reuses existing empty blocks without the

need for index page splitting.

o Locking is reduced for inserts because they occur at a block level rather than at a

row level.

o There is no need to REORG data after roll-in and roll-out.

MDC storage scenario You want to create an MDC for a Transaction Fact on Date, Product Name, and Region.

Here are some variables to consider for the MDC creation:

• There are 365 days in a year

• There are 100,000 products for company XYZ

• There are 10 regions for company XYZ

Initial MDC creation

If the MQT was created strictly on the Date, Product and Region column, there would be

1,000,000 new cells created daily (1 x 100,000 x 10) and 365 million cells per year

(previous x 365).

In regions where transactions are low, there will be a lot of sparse pages, and even empty

pages. This could lead to a lot of unnecessary space being used by allocating so many

cells (pages) to contain this block of data. This is not good.

Improving the creation of the MDC

Use functions to coarsify and limit MDC cardinality. For example:

• If you use the month function on the Date, you would have 12 results per year

• If you substring the Product name to pick the first character of the Product name,

you could have 26 potential results

• Leave Region as is with 10 results

Using the recommendation in this scenario, every year, the MDC would have 12*26*10 =

3210 cells or about 8-9 cells per day. This would eliminate the scarcity of data on many of


the pages, and provide a reasonable cardinality for the MDC to be effective in providing

a performance benefit.

MDC run time overhead and benefit considerations MDC is designed to provide large performance benefits for queries and improvement for

many DELETE scenarios. Even so, MDC tables do incur overhead over non-clustered

tables, while offering significant performance benefits over tables that are clustered using

a clustering index. Consider first the overhead of MDC versus an unclustered table:

• INSERT operations on a non-clustered table access each index to add a reference

to the inserted row. In contrast, INSERT on an MDC table requires an initial read

to the MDC composite block index in order to determine to which cell and block

the row belongs, followed (after the insert on the table) by access to each index in

order to insert a reference to the row. (Clustering indexes incur a similar

overhead).

• If the MDC table includes a generated column to coarsify one of the dimensions,

every INSERT will incur a small processing overhead to compute the generated

value for that column as all generated columns in DB2 are fully materialized, that

is, calculated and stored within the row.

However, when compared to a table clustered with the use of a clustering index, MDC

offers significant performance advantages:

• Index maintenance is dramatically reduced during INSERTs compared to the

processing required for a clustering index, as the DB2 database manager only

updates the block index when the first key is added to a block—unlike a RID-

index where every single inserted row to the table requires an update to all

indexes. That is, if there are 1000 rows per block, the rate of index updates is

1/1000th what it would be for a RID index.

• The index update is cheaper, because the index is smaller and therefore has

fewer levels in the tree. Fewer levels in the B+-tree means less processing to

determine the target leaf page for the index entry.

In both cases, whether clustered by a clustering index or by MDC, the DB2 database

manager will access the index (clustering index of the block index) during INSERT to

determine the target location of the row. Again the index is much smaller, and the height

of the tree usually shorter resulting in a faster search.

Determining when to use MDC versus a clustering index MDC provides huge value over a clustering index because the clustering is guaranteed

and automatic. In general you can achieve cluster ratios with MDC anywhere between

93%-100% depending on the coarsification needed. In contrast, clustering indexes can

cluster data close to 100% initially, but becomes declustered over time, and might require

time-consuming REORG to recluster the data. In general, use MDC to create and

maintain data clustering in your database unless:


• MDC would require coarsification and you are unable to add a generated

column to your table.

• The MDC version of the table results in table growth you are unable or unwilling

to incur. Well-designed MDC tables are typically 2-15% larger than non-MDC

tables.

• You find that MDC clustering will give you a lower cluster ratio (for example,

93%) due to coarsification and you are willing to incur the periodic REORG

processing in order to get the improved clustering that can be achieved with a

clustering index.

Use the following MDC design best practices:

• Start your selection for MDC candidates by looking for columns that are used as

predicates for equality, inequality, range, and sorting. To improve roll-in of data,

your dimension should match your roll-in range.

• Strive for density! Remember, an extent is allocated for every existing cell—

regardless of the number of rows in that cell. To leverage MDC with optimal

space utilization, strive for densely filled blocks.

• Constrain the number of cells in an MDC design. Keep the number of cells

reasonably low to limit how much additional storage the table will require when

converted to MDC form. 5% to 10% growth for any single table is a reasonable

goal. (See the discussion on MDC cells in the “Benefits of using MDC” section.)

There are exceptions, where even double the amount of growth is useful, but

they are rare.

Note: Block indexes are usually so small as a percentage of the corresponding

table size that, in most cases, you can ignore the storage required for them.

• Coarsify some dimensions to improve data density. Use generated columns to

create coarsifications of a table column that have much lower column cardinality.

For example, create a column on the month-of-year part of a date column, or use

(INT(colname))/100 to convert a DATE column with the format Y-M-D to Y-M.

For example,

CREATE TABLE Sales (SALES_DATE DATE, REGION CHAR(12), PRODUCT CHAR(30),… MONTH GENERATED ALWAYS AS ((INTEGER(DATE)/100)… ORGANIZE BY (MONTH, REGION, PRODUCT)

For the query:


select * from sales where sales_date>”2006/03/03” a nd date<“2007/01/01”..

The compiler generates the additional predicates:

month>=200603 and month<=200701

To reduce wasted space, specify a small table space extent size, which reduces

your MDC Block Size.

• Don’t select too many dimensions. It is very rare to find useful designs that have

more than three MDC dimensions without unreasonable storage requirements.

The more dimensions you have, the more the cardinality of cells will increase

exponentially. This makes it extremely hard to constrain the expansion of the

MDC table to the design goal of approximately 10% (versus a non-MDC table). If

the table expands unreasonably (for example, more than two times its non-MDC

size) not only will you require more storage, but the gains of clustering might be

lost due to the increase in doing I/O on partially filled blocks.

A simple example: Consider a table with three dimensions worth clustering on,

each with 10,000 unique values. If these columns have no correlation between

them, then clustering on all three dimensions without coarsification would result

in 10,000 x 10,000 x 10,000 cells, with a partially filled block per cell. If each block

is 1MB, the overhead from this careless design would be around 500,000 TB!

• Consider single-dimensional MDC. Single-dimensional MDC can still provide

massive benefits compared to those of traditional single dimensional clustered

indexes. The reasons are that:

o Clustering is guaranteed.

o MDC tables are indexed by block and not by row, resulting in indexes that

are roughly 1/1000 the size of traditional row-based indexes.

o DELETE performance using MDC roll-out is improved. RID indexes on

MDC are updated asynchronously with DB2 9.5.

o MDC facilitates roll-in of data.

o Use single-dimensional MDC (with coarsification if needed) to enforce

clustering instead of using a clustering index. Clustering indexes cluster

data on a best effort basis (there are no guarantees of how well they

cluster), and over time, they tend to become unclustered. In contrast

MDC guarantees clustering, avoiding the need to reorganize data. (See

the coarsification example in the “MDC Scenario” section.)


• Be prepared to tinker (on a test database). It might take trial and error to find an

MDC design that works really well. Use the DB2 Design Advisor with the –m C

option (C for clustering search). You can also use the db2mdcsizer utility, which

determines space requirements and simplifies administration of MDC tables.

This utility is available on AlphaWorks for certain versions of DB2 products.

MDC modifications will not impact your application programs.

• Use the MDC selection capability of the DB2 Design Advisor with a

representative workload to find suitable MDC dimensions for an existing table.


Database partitioning (shared-nothing hash

partitioning) best practices Database partitioning is a technique for horizontally distributing rows in the database

across many database instances that work together to form a single large database server.

These instances can be located within a single server, across several physical machines, or

a combination. In DB2 products, this is called the Database Partitioning Facility (DPF).

Database partitioning allows the DB2 database manager to scale to hundreds of instances

that participate in the larger database system. The scalability of this design can approach

near linear scaleout for many complex query workloads. As such, database partitioning

has become extremely popular for data warehousing and BI workloads due to its near

linear scaleout characteristics and its ability to scale to hundreds of terabytes of data and

hundreds of CPUs. The architecture is less popular for OLTP processing due to the inter-

instance communication incurred on each transaction, which though small, can still be

very significant for short running transactions typically found in OLTP workloads. DPF

might be used for OLTP applications that require a cluster of computers for throughput.

Shared-nothing hash partitioning hashes rows to logical data partitions. The primary

design goal of hash distribution is to ensure the even distribution of data across all

logical nodes (as range partitioning tends to skew data). These partitions might reside

within a single server or be distributed across a set of physical machines, as shown in

Figure 9:

Figure 9 Table hash-partitioning


The scalability of shared-nothing databases has proven to be nearly linear for a wide

range of complex query workloads. Also, the modular nature of the design lends itself to

linear scaleout as storage pressures, workload pressures, or both grow. As a result,

shared-nothing architectures have dominated data warehousing for the past decade.

Database partitioning is implemented without impact on existing application code, and is

completely transparent. Partitioning strategies can be modified online with the

redistribution utility without affecting application code.

The primary design choice is determining which columns to use to hash partition each

table that comprises the database-partitioning key. The goals are twofold:

1. Distribute data evenly across database partitions. This requires choosing

partitioning columns that have a high cardinality of values to ensure an even

distribution of rows across the logical partitions.

2. Minimize shipping of data across database partitions during join processing.

Collocation of rows being joined will occur (avoiding movement) if the

partitioning key is included in the WHERE clause.

Another central problem in designing shared-nothing data warehouses is determining

the best combinations of memory, CPUs, buses, storage capacity, storage bandwidth, and

networks. How much or how many do you need of each of these?

To help solve this problem, IBM provides the IBM Balanced Warehouse™, which is

based on DB2 database system’s shared nothing architecture. It was developed through

IBM best practices used for successful client implementations.

Balanced Warehouse and Balanced Configuration Units (BCU) The Balanced Warehouse combines building blocks known as Balanced Configuration

Units (BCU). These building blocks are preconfigured, pre-tested, and tuned for

performance to provide an ideal volume and ratio of system resources. The BCU

combines the best practices for database configuration and hardware components to

greatly simplify warehouse setup and deployment. Scores of best practices for resource

ratios and database configuration have been incorporated into the Balanced Warehouse.

Figure 10 shows the various Balanced Warehouse offerings for 2007 and 2008. 5 You can

see that the Balanced Warehouse currently offers three classes of offerings, C, D and E.

These three classes offer increasing power and scalability to the solution. The C class is an

entry level offering intended for SMB markets, or systems integrators that can be

contained in a single server. D and E class offerings scale out to much larger

configurations using DB2 database partitioning capabilities.

5 For an up-to-date version of the Balanced Warehouse offerings refer to the Balanced Warehouse web pages online at: http://www.ibm.com/software/data/infosphere/balanced-warehouse/


Figure 10 Balanced Warehouse offerings6, 2007-2008

Use the following database partitioning best practices:

• Select partitioning keys that have a large number of values (high cardinality) to

ensure even distribution of rows across partitions. Unique keys are good

candidates. If you are having a difficult time finding a key that can distribute

data evenly across partitions, you might want to consider using a function on a

column.

• Avoid choosing a partitioning key with a column that is updated frequently; this

could incur additional overhead on the update to repartition the row to another

partition.

• If possible, as your partitioning key, try to choose a column that has a simple

datatype, such as fixed-length character or integer. The hashing performance can

benefit from doing this versus selecting a complex datatype.

• To increase collocation, consider using the join column as the partitioning key for

a table that is frequently joined (provided that the columns have high cardinality

to satisfy the even distribution of rows). Select the minimum number of columns

required to achieve high cardinality and even distribution of rows in the

6 Prices reflected in the “Estimated Cost” in this table are current as of May 2008, exclude applicable taxes, and are subject to change by IBM without notice.


partitioning key. Reducing the number of columns in the partitioning key

improves the likelihood that the column will be in the join predicates (improving

the odds of collocation).

• Ensure that unique indexes are a superset of the partitioning key.

• Use replicated MQTs for small tables (tables that are less than 3% of the total

database size, or less than 5% of the largest table size are a reasonable rule of

thumb) or infrequently updated tables in order to:

o Improve collocation and reduce movement over the network

o Assist in the collocation of joins

o Improve performance of frequently executed joins in a partitioned

database environment by allowing the database to manage precomputed

values of the table data.

For example:

CREATE TABLE R_EMPLOYEE AS ( SELECT EMPNO, FIRSTNME, MIDINIT, LASTNAME, WORKDEPT FROM EMPLOYEE ) DATA INITIALLY DEFERRED REFRESH IMMEDIATE IN REGIONTABLESPACE REPLICATED;

To update the content of the replicated MQT, run the following statement:

REFRESH TABLE R_EMPLOYEE;

Note: After using the REFRESH statement, you should run RUNSTATS on the

replicated table as you would on any other table.

• Collocate the largest dimension-table’s key as the partition key for the fact table,

considering the number of distinct values and skew within the corresponding

fact-table column.

• Replicate small dimensions (less volatile) tables, where “small” is relative and

depends on the installation’s available storage.

• Replicate a horizontal or vertical subset of dimensions that don’t match the

partitioning key, as follows:

o Partition any remaining dimensions on their PK.


o After creating a replicated table to improve collocation, remember to

collect table and index statistics (or use the DB2 automatic statistics

collection feature). Remember to implement the same indexes on the

replicated MQTs as you have defined on the base table(s).

o Define replicated MQTs as REFRESH IMMEDIATE if they are small and

rarely updated. Try to limit the number of parallel ETL jobs executing

when REFRESH IMMEDIATE is specified. A deferred refresh strategy

provides less overhead for updates of the base table.

• Distribute large tables on several partitions. Small tables with less than one

million rows should be located on one database partition only.


Table (range) partitioning best practices Table partitioning should be used predominantly to facilitate improved roll-in and roll-out

of data. It enables an administrator to add a large range of data (such as a new month of

data) to a table, en-masse, and perhaps more importantly it allows an administrator to

remove data from a table, or from the database, en-masse, almost in an instant (without

data movement).

DB2 database systems' unique asynchronous index-cleanup technology means that even

while using global indexes that index data across several range partitions, a range can be

detached from the table, and the index keys associated with that range become

immediately invisible to incoming queries. The keys are subsequently deleted quietly in a

background process with negligible impact to the executing database workload.

Table partitioning also offers side benefits of increased query performance through an

internal process called partition elimination, which, in many cases, enables the query

compiler to select improved execution plans. This is a secondary benefit of table

partitioning.

Furthermore, table partitioning enables the division of a table into several ranges that are

stored in one or more physical objects within a database logical partition. The goal of

table partitioning is to logically organize data to facilitate optimal data access and the

roll-out of data. The division of the table into ranges is transparent to the application, and

can therefore be designed at any point in the application development cycle.

See “Best Practices: Data Life Cycle Management” white paper for more details on table

partitioning. Other attributes and features of table partitioning include the following

ones:

• Each range can be in a different table space

• Ranges can be scanned independently

• Performance for certain BI-style queries is improved through partition

elimination

• New ALTER ATTACH/DETACH statements for easier roll-in and roll-out of

data:

o New ATTACH operation for roll-in

o New DETACH operation for roll-out

• SET INTEGRITY is now online (allowing read/write access to older data)

• For new ranges, ADD plus LOAD operations can be used over ATTACH plus

SET INTEGRITY operations


The following example shows how to define a partitioned table:

CREATE TABLE SALES(SALE_DATE DATE, CUSTOMER INT, …) PARTITION BY RANGE(SALE_DATE) (STARTING ‘1/1/2006’ ENDING ‘3/31/2008’, STARTING ‘4/1/2006’ ENDING ‘6/30/2008, STARTING ‘7/1/2006’ ENDING ‘9/30/2008’, STARTING ‘10/1/2006’ ENDING ’12/31/2012’);

This statement results in the creation of four table objects, each one of which stores a

range of data, as shown in Figure 8:

Figure 8 Table partitioning by date range

Use the following table partitioning best practices:

• Use table (range) partitioning to rapidly delete (roll-out) ranges of data. Match

range-partitioning periods to roll-in and roll-out ranges. For example, if you

need to roll-in and roll-out data by month, range partitioning by month is a

reasonable strategy.

• Partition on DATE columns. Roll-in and roll-out scenarios are almost always

based on dates. Improved query execution plan (QEP) selection, using partition

elimination7, and a significant set of those opportunities are also based on date

predicates.

• Limit the number of ranges. Remember that each range is a table object with a

minimum of two extents. Avoid designs with an excessive number of partitions.

A rule of thumb is at least 50MB of data in each range (several gigabytes of data

per range is best). Make the size of your ranges match the size you typically roll-

out.

7 Partition elimination improves your SQL workload performance. Partition Elimination is a strategy used internally by the query compiler. The query compiler automatically determines if it can exploit the table partitioning for this purpose. Typically dates can satisfy the roll-out requirement and often provide partition elimination benefits to many queries.


• When adding new ranges, ADD table partition with a LOAD operation is often

faster than the ATTACH of a partition with subsequent SET INTEGRITY

operations.

o The LOAD utility has an option to maintain indexes incrementally, and to

write only a single log row for the event, regardless of how many rows are

inserted into the table. Although the LOAD utility supports concurrent read

access to older data, queries need to be drained.

• Consider separating table partitions in separate table spaces to facilitate backup

and recovery. Table partitions (ranges) can be backed up and restored by table

space.

• Place global indexes, which can be large, in their own individual table space.

Placing all the global indexes in a single table space can impact the elapsed time

of the BACKUP utility (because the index table space can become much larger

than the data table spaces).

• Ensure and maintain the clustering of data by making the range-partitioning key

the leading column in a clustered index (no MDC). Data will not be clustered

properly if your clustered index is not prefixed by your partition key. For

example,

PARTITION BY RANGE (Month, Region) CREATE INDEX … (Month, Region, Department) CLUSTER

• Use page-level sampling to reduce RUNSTATS time. A sampling rate of 10% to

20% provides good quality statistics with a major performance improvement. For

details, see “Best Practices: Writing and Tuning Queries for Optimal

Performance” white paper.

• Place table partitions in different table spaces; this allows you to backup new

ranges as data is rolled in to the new range, without having to backup the other

partitions. This greatly improves the speed and reduces the size of backup

images.


UNION All View (UAV) partitioning best practices Prior to the availability of DB2 9 table partitioning, applications often had a requirement

to partition data by ranges. By creating a table for each range with the appropriate

constraints, DBAs were able to provide a single system view by the creating a UAV for

all the tables. For example:

Create Table TestQ1 (Col 1 date) Alter Table TestQ1 add constraint q1_chk (month(dt) in (1,2,3)

Repeat the table create/constraint for each quarter:

Create View Test as Select * from TestQ1 Union Select * from TestQ2

Table partitioning provides a single view of the table to the compiler and optimizer. This

allows more aggressive predicate push-down to the different ranges than UAV, and a

more consistent model for partitioning data. Table partitioning is the recommended

method for implementing range-based partitioning for most application requirements.

NOTE: UAVs are not a parallel processing method for dividing work across CPUs. The

DB2 Database Partitioning Facility (DPF) should be used for that purpose (see the

discussion on “Database partitioning”).

As with Table partitioning, you can use UAV to store ranges of data in distinct table

spaces, providing granularity for BACKUP operations (see the discussion on “Table

partitioning”).

The advantages of the UAV design predominantly revolve around the ability to operate

on some ranges independent of others, or to design some ranges with unique attributes.

Conversely table partitioning provides a homogenous view of a range-partitioned table.

Although table partitioning is generally preferred, there are advantages to UAVs:

• For replication: Historical tables in UAVs can be compressed. (Use UAVs when

replication is needed on certain ranges of data, while other ranges that do not

require replication can benefit from compression.)

• UAVs are utilized to reduce the granularity of utility operations (such as REORG

and RUNSTATS). Utilities can operate on a given table containing a range.

NOTE: REORG is commonly the most important of these. This is valuable when

ranges are changing frequently requiring reclustering or recompression of a

range. UAVs allow this operation to be performed on the subset of ranges that


require it. DB2 9.5 has automatic dictionary rebuild for table partitioning,

alleviating the need to REORG a new range for compression.

• Heavily used ranges can be isolated into separate tables containing additional

indexes or MQTs to optimize data access.

• UAVs provide end users with a single view of federated data (stored in multiple

IBM or non-IBM databases). A UAV can provide a single view of data across

several databases.

Table partitioning provides the following advantages over the UAV partitioning

approach:

• Preparation time is faster (one table instead of multiple tables in a view)

• Simpler management (one table, not multiple tables)

• Less catalog locking for roll-in and roll-out of ranges

• Unique indexes across all ranges supported

• Better handling of complex queries

• Simpler EXPLAINs (using the explain facility)

Migrating UAVs to table partitioning The migration of UAVs to table partitioning can be achieved without data movement by

following this procedure:

1. Create a partition table with a single dummy partition and with a range that does

not interfere with existing ranges. This requires the same page size and extent

size.

2. ALTER ATTACH all tables in the UAV.

3. Drop the dummy partition.

4. Run SET INTEGRITY after all TABLE ATTACH commands. To speed up set

integrity:

a) Drop all indexes.

b) Recreate indexes after SET INTEGRITY completes.

Use the following UAV partitioning best practices:

• Use database partitioning to achieve scalability, rather than UAVs.

• As with table partitioning, use UAVs in order to place ranges of data in distinct

table spaces, improving BACKUP granularity.


Recommendation: Migrate UAVs to table partitioning, taking the following

considerations into account:

• Newly developed applications with range-partition requirements should be

implemented with table partitioning rather than with UAVs, unless you have

strong requirements for one or more of the UAV advantages listed above.

• UNION ALL applications being migrated to table partitions utilizing deep

compression should be implemented with DB2 9.5 in order to benefit from

automatic dictionary compression.


Database partitioning, table partitioning, and MDC in

the same database design best practices Database partitioning, table partitioning, and MDC can be implemented simultaneously

in the same design.

• Database partitioning can be implemented to help achieve scalability and to

ensure the even distribution of data across logical partitions.

• Table Partitioning can be implemented to facilitate Query Partition Elimination

and roll-out of data.

• MDC can be implemented to improve Query Performance and facilitate the roll-

in of data.

This is a best practice approach for deploying large scale applications.

For example:

CREATE TABLE TestTable (A INT, B INT, C INT, D INT …) IN Tablespace A, Tablespace B, Tablespace C … INDEX IN Tablespace B DISTRIBUTE BY HASH (A) PARTITION BY RANGE (B) (STARTING FROM (100) ENDING (300) EVERY (100)) ORGANIZE BY DIMENSIONS (C,D)

See “Best Practices: Data Life Cycle Management” white paper for more details.

To deploy large scale applications, implement database partitioning, table partitioning,

and MDC in the same database design.


Roll-in and roll-out of data with table partitioning

and MDC best practices Design your partitioning strategy to use table partitioning for your roll-out strategy and

to use MDC on a single dimension for your roll-in strategy.

For example, if you roll-in daily and roll-out monthly, specify an MDC on day and a

Table Partition Key for month (calculated values are supported).

This approach reduces the number of table partitions and eases the DBA administrative

tasks. It takes advantage of the roll-in features of MDC: reduced index I/O with block

indexes and reduced logging.


Use table partitioning for roll-out, and MDC on a single dimension for roll-in.


Rolling-in large data volumes using table partitioning

best practices Applications that need to roll-in very large data volumes can speed up the table

attachment process by ADDing rather than ATTACHing table partitions, which avoids

the need to execute SET INTEGRITY.

There is an alternative to ATTACHing a table partition: you also have the ability to

ALTER ADD an empty table to a table partition. After the empty table has been added,

you can populate the table using the LOAD utility (with read access to older data) or

using inserts (logged).

LOAD will help provide superior performance, and can load either from external files or

from a query definition using the “LOAD from cursor” capability.

For applications utilizing Deep Compression, DB2 9.5 facilitates this technique for

rolling-in data because it provides Automatic Dictionary Compression, avoiding the

need to REORG in order to compress data.


Use the following roll-in and roll-out best practices:

• Use table partitioning to roll-out large volumes of data.

• ALTER ADD an empty table to a table partition and populate it using the LOAD

utility when using table partitioning for roll-in of data.


Materialized query table (MQT) best practices An MQT table is a table whose definition is based on the result of a query. The MQT

contains pre-computed results. MQTs are a powerful way to improve response times for

complex queries, especially queries that might require some of the following types of

data or operations:

• Aggregated data over one or more dimensions

• Joins and aggregated data between tables in a group

• Data from a commonly accessed subset of data—that is, from a hot horizontal or

vertical database partition

• Repartitioned data from a table, or part of a table, in a partitioned database

environment

• Replicated MQTs can reduce network traffic for non-partitioned tables in a DPF

environment

In addition to speeding up query performance, MQTs can be used on nicknames of

federated data sources to maintain frequently accessed data locally. MQTs can be

maintained with SQL or Q Replication (the system-maintained MQT option for

Federated Nicknames is not supported).

MQTs are completely transparent to applications. Knowledge of MQTs is integrated into

the SQL and XQuery compiler, which determines whether an MQT should be used to

answer all or part of a query. As a result, you can create and drop MQTs, without making

application code changes, much like you can create and drop indexes without making

application code changes.

Figure 11 summarizes the characteristics of MQTs according to their refresh type. In the

table, “Optimization” indicates that the DB2 database manager will exploit the deferred

MQT where possible, when it processes a query, whereas, “No optimization” indicates

that the MQT will not be looked at, since it could be arbitrarily stale; that is, the database

manager does not know when the last refresh occurred against the MQT.

Note that MQTs can decrease INSERT performance of the base table.

To assist in problem determination, the DB2 9 explain facility indicates why an MQT was

not chosen for an access path.


Figure 11 Summary of MQT characteristics by refresh type

Use the following MQT design best practices:

• Create an MQT by using the same or higher isolation level that is used by the

queries for which you intend to use the MQT. The isolation levels, in order of

descending restrictiveness, are RR, RS, CS, and UR.

• Focus on frequently-used queries that use a lot of resources. These queries

provide the greatest opportunities for performance gains through MQTs.

• Set a limit on the number of MQTs that you are willing to maintain. There are

two reasons for this:

o Each MQT uses storage space on disk and additional UPDATE

overhead.

o Each MQT adds complexity to the search for the optimal QEP, increasing

query compilation time.


• Decide on a limit for the amount of disk space available for MQTs. Generally, do

not allocate more than 10% to 20% of the total system storage of a data

warehouse for MQTs.

• Consider indexing the MQTs and execute RUNSTATS after index creation. Try to

create an MQT that is generally useful to multiple queries. Often such an MQT is

not a perfect match for a query and might require indexing. Replicated MQTs

should have the same indexing design as the base table.

• Help the query compiler find matching MQTs. (MQT routing is complex.) Give

the compiler as much information as possible by using the following techniques:

o Keep statistics on the MQTs up-to-date.

o Use RI on foreign columns in the MQT. (To avoid system overhead,

specify non-enforced RI.) Make FK columns NOT NULL.

o Avoid problematic MQT designs that make routing difficult. Try to

avoid using EXISTS, NOT EXISTS, and SELECT DISTINCT. Unless the

MQT is an exact match for a query, these predicates can make it difficult

for the query compiler to make use of the MQT.


Post-design tools for improving designs for existing

databases

Explain facility best practices The explain facility can show you whether design features are being used. For example, it

can show you whether indexes are being accessed in a QEP, whether partition

elimination is being used, and whether queries are being routed to MQTs.

Consider the fragment shown in Figure 12 of the QEP from the explain facility for query

20 of TPC-H 8.

Figure 12 Fragment of QEP for Query 20 of TPC-H

The QEP clearly shows that the information for PARTSUPP requires access to both the

index TPCD.UXPS_PK2KSC and the PARTSUPP table itself. How can you determine the

reason?

8 TPC-H: The TPC Benchmark™H (TPC-H) is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions.


Looking at operator (15) you can see that the FETCH statement requires access to the

PARTSUPP table because the index includes PS_PARTKEY and PS_SUPPKEY columns,

but does not include the PS_AVAILQTY column. This strongly suggests that by adding

the PS_AVAILQTY column to this index, you can avoid accessing the PARTSUPP table

in the subplan, thereby improving performance.

The explain output shown in Figure 13 (from DB2 9.1) indicates which MQTs the

optimizer considered but did not choose for a QEP, and explains why. The reason might

be due to cost, or due to the fact that the MQT is not similar enough to be matched.

Examples:

explain plan for select c1, count(*) from t1 where c2 >= 10 group by c1;

EXP0073W The following MQT or statistical view was not eligible because one or more data filtering predicates from the query could not be matched with the MQT: “PKSCHO "."MQT2". EXP0073W The following MQT or statistical view was not eligible because one or more data filtering predicates from the query could not be matched with the MQT: “PKSCHO "."MQT3". EXP0148W The following MQT or statistical view was considered in query matching: “PKSCHO "."MQT1". EXP0148W The following MQT or statistical view was considered in query matching: “PKSCHO "."MQT2". EXP0148W The following MQT or statistical view was considered in query matching: “PKSCHO "."MQT3".EXP0149W The following MQT was used (from those considered) in query matching: “PKSCHO "."MQT1".

Figure 13 Using the explain facility to understand MQT selection

Utilize the explain facility to help understand your design choices.

DB2 Design Advisor best practices The DB2 Design Advisor is a key feature of the DB2 autonomic computing initiative. It is

a push button solution: given a workload (user provided or system detected) and,

optionally a disk constraint9, the Design Advisor recommends physical database design

options that are designed to optimize the execution of the workload provided. The

Design Advisor performs extensive “what-if” analysis, data sampling, and correlation

modeling to explore thousands of design permutations that humans cannot.

The Design Advisor has the following capabilities:

o Index selection

o MQT selection

o MDC selection

o Partitioning selection (for database partitioning)

o Industry-leading workload compression

9 disk constraint: A limit on the amount of disk space the Design Advisor can consider available for adding new design features. For example, the limit might be 100MB, and that would mean that the new design aspects recommended by the Design Advisor, such as additional indexes or MQTs, should not consume more than an additional 100MB in total.


Many customers have reported using the Design Advisor to make dramatic

improvements in physical database design, leading to performance improvements of

over five times for individual queries or entire workloads. Of course, you should not

apply the results from the Design Advisor without due consideration.

Figure 14 highlights the benefit of the Design Advisor. In this example, a decision-

support database running the TPC-H workload and data set was created with a

reasonable set of indexes, meaning that a good database designer could have come up

with this set and considered it adequate. The Design Advisor was then used to provide

additional recommendations for the database, which when applied resulted in a six-and-

a-half time performance gain.

Figure 14 Benefits from DB2 Design Advisor

MDC selection capability of the DB2 Design Advisor

For improved workload performance, use the MDC selection capability of the Design

Advisor to obtain recommended clustering dimensions for use in an MDC table,

including coarsification on base columns. Only single-column dimensions, and not

composite-column dimensions, are considered, although single or multiple dimensions

can be recommended for the table.

The MDC selection capability is enabled using the -m <advise type> flag on the

db2advis utility. The advise types (“C” for MDC and clustering indexes, “I” for index,

“M” for MQT, and “P” for database partitioning) can be used in combination with each

other.


The MDC recommendations provided by the Design Advisor are intended to provide

optimized density and to limit the amount of table expansion that will occur when the

table is converted to MDC. The analysis operations within the advisor includes not only

the benefits of block-index access, but also the impact of MDC on INSERT, UPDATE, and

DELETE operations against the dimensions of the table.

The output includes generated-column expressions for each table for coarsified

dimensions that appear in the MDC solution, and an ORGANIZE BY clause

recommended for each table.

Use the following Design Advisor best practices:

• Provide a broad representation of your workload as input, and avoid running

the Design Advisor for one query at a time. This allows the Design Advisor to

make recommendations that apply to an entire workload rather than to a single

query, perhaps to the detriment of other parts of the workload.

• Include as input the INSERT, UPDATE, and DELETE operations that occur in

your workload so that the Design Advisor can model the drawbacks and benefits

(of adding new design features) to queries. For example, new indexes have

maintenance drawbacks in addition to their value in improving query execution

times.)

• Use the MDC selection capability of the DB2 Design Advisor (on tables that are

greater than 12 extents in size) to obtain recommended clustering dimensions for

use in MDC tables for improved workload performance.

• Use Query Patroller or the DB2 9.5 Workload Manager to automatically capture

your actual workload in a format that serves as input to the Design Advisor.


Best Practices

Datatype selection

• Choose numeric datatypes over character datatypes whenever

possible.

• Use data modeling tools, such as Rational Data Architect, to

publish to a larger team.

• Store value definitions in a table, where the definitions can be

joined to values to provide context.

Table normalization and denormalization

• Normalize your tables using Third Normal Form (3NF) for most

general-purpose databases, the star schema or snowflake model

for dimensional queries, and the IBM Layered Data Architecture

for broad-based data warehousing, onLine analytical processing

(OLAP), and business intelligence (BI).

Index design

• Design a basic set of indexes using workload predicates and

primary keys (PKs) and foreign keys (FKs). Indexes are the single

most important physical database design feature. (Remember

that indexes and Refresh Immediate MQTs incur a penalty for

INSERT, UPDATE, and DELETE operations.)

Data clustering and MDC

• Use MDC to improve query performance, and for roll-in and roll-

out of data.


Database partitioning (shared-nothing hash partitioning)

• Use database partitioning to improve scalability for large BI

applications.

• Focus on both high cardinality of the partitioning key and

improved collocation of joins when selecting the partitioning key.

• Use hash-partitioning (recommended primarily for data

warehousing, which benefits from shared-nothing databases).

Table (range) partitioning

• Use range-clustered tables (RCTs) to provide fast, direct access to

data.

• Design table partitions based on roll-in and roll-out

characteristics. Partitioning by month or financial quarter is a

good strategy.

UNION ALL View (UAV) partitioning

• Use UAVs when replication is needed on certain ranges of data,

while other ranges that do not require replication can benefit

from compression. UAVs allow you to have different

characteristics on different objects that underlay the view. In

general, homogeneity provides a cleaner and more maintainable

architecture. However, there are exceptions where this ability to

mix and match is needed.

• Use database partitioning for scalability of decision support,

business intelligence, data warehousing, and reporting

workloads, rather than UAVs.

• Use table partitioning to improve recovery efficiency and roll-out

efficiency.

Roll-in and roll-out of data with table partitioning and MDC

• Use table partitioning for roll-out, and MDC on a single

dimension for roll-in.


Database partitioning, table (range) partitioning, and MDC in the same

database

• Implement database partitioning, table partitioning, and MDC in

the same database design to deploy large scale applications.

MQTs

• Use replicated MQTs to improve collocation of joins for database

partitioning, and query access to aggregated data.

• Help the query compiler find MQTs by keeping MQT statistics

up-to-date, by defining functional dependencies, and by defining

referential integrity (RI) (including FK columns in the MQT,

defined as NOT NULL). Avoid problematic MQT designs that

make routing difficult by avoiding the use of EXISTS, NOT

EXISTS, and SELECT DISTINCT clauses, unless the MQT is an

exact match for the query.

Post-design tools for improving designs for existing databases

• Use the explain facility to help understand your design choices.

• Use the DB2 Design Advisor to generate ideas for physical

database design improvements (for indexes, MQTs, and

partitioning). When doing so, provide as input a set of queries,

not just one query at a time. This allows the Design Advisor to

make trade-offs across the workload.

• Utilize the DB2 9.5 Workload Manager (WLM), Query Patroller,

Snapshot Scripts, or Statement Event Monitoring to automatically

capture SQL statements for input to the explain facility and to the

DB2 Design Advisor.


Conclusion

Physical database design is the single most important quality of any database. It affects

the scalability, efficiency, maintainability and extensibility of a database like no other

aspect of database administration. Although database design can be complex, a good

design improves performance and reduces operational risk. Mastery of this talent is

undoubtedly the cornerstone of professional database administrators.


Further reading • DB2 Best Practices

http://www.ibm.com/developerworks/db2/bestpractices/

• DB2 9 for Linux, UNIX and Windows manuals

http://www.ibm.com/support/docview.wss?rs=71&uid=swg27009552

• DB2 Data Warehouse Edition documentation

http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.d

we.welcome.doc/dwe91docs.html

• IBM Balanced Warehouse documentation

http://www.ibm.com/software/data/infosphere/balanced-warehouse/

• IBM Data Warehousing and Business Intelligence documentation

http://www.ibm.com/software/data/db2bi/

• IBM DB2 9.5 for Linux, UNIX, and Windows Information Center

http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp

• S. Lightstone, T. Teorey, T. Nadeau, “Physical Database Design: the database

professional's guide to exploiting indexes, views, storage, and more”, Morgan

Kaufmann Press, 2007. ISBN: 0123693896

• Sam S. Lightstone, “Best Practices for Creating Scalable High Quality Data

Warehouses with DB2”, IBM Information On Demand 2007 Global Conference,

October 14 - 19, 2007. Mandalay Bay Resort, Las Vegas, NV

• T. Teorey, S. Lightstone, T. Nadeau, “Database Modeling & Design: Logical

Design, 4th edition”, Morgan Kaufmann Press, 2005. ISBN: 0-12-685352-5


Contributors

Kevin L. Beck

Information Management Software

Brad Cassells

DB2 Information Development

Karl Fleckenstein

Senior IT Architect

Lead Architect for SAP/DB2 Solutions

John Hornibrook

DB2 Query Optimization Development

Norma Mullin

Enterprise Data Management

Consulting IT Specialist

Reuven Stepansky

Senior Managing Specialist

North American Lab Services

Tim Vincent

Chief Architect DB2 LUW


Notices This information was developed for products and services offered in the U.S.A.

IBM may not offer the products, services, or features discussed in this document in other

countries. Consult your local IBM representative for information on the products and services

currently available in your area. Any reference to an IBM product, program, or service is not

intended to state or imply that only that IBM product, program, or service may be used. Any

functionally equivalent product, program, or service that does not infringe any IBM

intellectual property right may be used instead. However, it is the user's responsibility to

evaluate and verify the operation of any non-IBM product, program, or service.

IBM may have patents or pending patent applications covering subject matter described in

this document. The furnishing of this document does not grant you any license to these

patents. You can send license inquiries, in writing, to:

IBM Director of Licensing

IBM Corporation

North Castle Drive

Armonk, NY 10504-1785

U.S.A.

The following paragraph does not apply to the United Kingdom or any other country where

such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES

CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER

EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-

INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do

not allow disclaimer of express or implied warranties in certain transactions, therefore, this

statement may not apply to you.

Without limiting the above disclaimers, IBM provides no representations or warranties

regarding the accuracy, reliability or serviceability of any information or recommendations

provided in this publication, or with respect to any results that may be obtained by the use of

the information or observance of any recommendations provided herein. The information

contained in this document has not been submitted to any formal IBM test and is distributed

AS IS. The use of this information or the implementation of any recommendations or

techniques herein is a customer responsibility and depends on the customer’s ability to

evaluate and integrate them into the customer’s operational environment. While each item

may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee

that the same or similar results will be obtained elsewhere. Anyone attempting to adapt

these techniques to their own environment do so at their own risk.

This document and the information contained herein may be used solely in connection with

the IBM products discussed in this document.

This information could include technical inaccuracies or typographical errors. Changes are

periodically made to the information herein; these changes will be incorporated in new

editions of the publication. IBM may make improvements and/or changes in the product(s)

and/or the program(s) described in this publication at any time without notice.

Any references in this information to non-IBM Web sites are provided for convenience only

and do not in any manner serve as an endorsement of those Web sites. The materials at

those Web sites are not part of the materials for this IBM product and use of those Web sites is

at your own risk.

IBM may use or distribute any of the information you supply in any way it believes

appropriate without incurring any obligation to you.

Any performance data contained herein was determined in a controlled environment.

Therefore, the results obtained in other operating environments may vary significantly. Some

measurements may have been made on development-level systems and there is no

guarantee that these measurements will be the same on generally available systems.

Furthermore, some measurements may have been estimated through extrapolation. Actual

results may vary. Users of this document should verify the applicable data for their specific

environment.


Information concerning non-IBM products was obtained from the suppliers of those products,

their published announcements or other publicly available sources. IBM has not tested those

products and cannot confirm the accuracy of performance, compatibility or any other

claims related to non-IBM products. Questions on the capabilities of non-IBM products should

be addressed to the suppliers of those products.

All statements regarding IBM's future direction or intent are subject to change or withdrawal

without notice, and represent goals and objectives only.

This information contains examples of data and reports used in daily business operations. To

illustrate them as completely as possible, the examples include the names of individuals,

companies, brands, and products. All of these names are fictitious and any similarity to the

names and addresses used by an actual business enterprise is entirely coincidental.

COPYRIGHT LICENSE:

This information contains sample application programs in source language, which illustrate

programming techniques on various operating platforms. You may copy, modify, and

distribute these sample programs in any form without payment to IBM, for the purposes of

developing, using, marketing or distributing application programs conforming to the

application programming interface for the operating platform for which the sample

programs are written. These examples have not been thoroughly tested under all conditions.

IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these

programs. The sample programs are provided "AS IS", without warranty of any kind. IBM shall

not be liable for any damages arising out of your use of the sample programs.

Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International

Business Machines Corporation in the United States, other countries, or both. If these and

other IBM trademarked terms are marked on their first occurrence in this information with a

trademark symbol (® or ™), these symbols indicate U.S. registered or common law

trademarks owned by IBM at the time this information was published. Such trademarks may

also be registered or common law trademarks in other countries. A current list of IBM

trademarks is available on the Web at “Copyright and trademark information” at

www.ibm.com/legal/copytrade.shtml

Windows is a trademark of Microsoft Corporation in the United States, other countries, or

both.

UNIX is a registered trademark of The Open Group in the United States and other countries.

Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

Other company, product, or service names may be trademarks or service marks of others.

ibm db2 for linux, unix, and windows best practices physical

Documents

rational data

db2 design

data modeling

physical database

db2 database

logical database

composite

mdc block