stack it and unpack it

1. Stack It & Pack It Partitioning And Compression For Warehouses / VLDB Jeff Moss

2. Who Dunnit ? 3. Agenda

My background

Squeeze your data with data segment compression

Partition for success

Questions

4. My Background

Independent Consultant

13 years Oracle experience

Blog:http://oramossoracle.blogspot.com/

Focused on warehousing / VLDB since 1998

First project

UK Music Sales Data Mart

Produces BBC Radio 1 Top 40 chart and many more

2 billion row sales fact table

1 Tb total database size

Currently working with Eon UK (Powergen)

4Tb Production Warehouse, 8Tb total storage

Oracle Product Stack

5. What Is Data Segment Compression ?

Compresses data by eliminating intra block repeated column values

Reduces the space required for a segment

but only if there are appropriate repeats!

Self contained

Lossless algorithm

6. Where Can Data Segment Compression Be Used ?

Can be used with a number of segment types

Heap & Nested Tables

Range or List Partitions

Materialized Views

Cant be used with

Subpartitions

Hash Partitions

Indexes but they have row level compression

External Tables

Tables that are part of a Cluster

7. How Does Segment Compression Work ? Database Block Symbol Table Row Data Area Block Common Header (20 bytes) Transaction Header (24 bytes fixed + 24 bytes per ITL) Data Header (14 bytes) Compressed Data Header (16 bytes -variable ) Tail (4 bytes) 100 Call to discuss bill amount TEL NO YES 3 TEL 4 NO 5 YES 2 Call to discuss bill amount 1 100 1 2 3 4 5 101 Call to discuss new product MAIL NO N/A 8 MAIL 9 N/A 7 Call to discuss new product 6 101 6 7 8 4 9 102 Call to discuss new product TEL YES N/A 10 7 3 5 9 10 102 ID DESCRIPTION CONTACT TYPE OUTCOME FOLLOWUP Table Directory (8 bytes) Row Directory (2 bytesper row ) 8. What Affects Compression ?

Undisclosed Algorithm

I asked but support wouldnt play ball!

Many Factors

Block size

Anything which affectsblock overhead

Interested Transaction Lists ( INITRANS )

Number of columns

Number of rows

PCTFREE

Number of repeats ( in the block )

Length of column value(s)

9. Compression v Block Size

200K rows, Non ASSM Uniform Local extents

More chance of repeats in any given block

10. Compression v ITL

More ITL = more overhead = less repeats

11. Compression v Number Of Columns

Same amount of data to store

More columns = more overhead = less repeats

12. Compression v PCTFREE

Higher PCTFREE = less space = less repeats

13. Compression v NDV

Higher NDV = less repeats

14. Compression v Column Length

Minimum 6 characters for compression

Longer Length = more compression savings

15. Compression v Ordering

Colocate data to maximise compression benefits

For maximum compression

Minimise the total space required by the segment

Identify most compressable column(s)

For optimal access

We know how the data is to be queried

Order the data by

Access path columns

Then the next most compressable column(s)

Uniformly distributed Colocated 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 16. Get Max Compression Order Package

PROCEDURE mgmt_p_get_max_compress_order

Argument NameTypeIn/Out Default?

------------------------------ ----------------------- ------ --------

P_TABLE_OWNERVARCHAR2INDEFAULT

P_TABLE_NAMEVARCHAR2IN

P_PARTITION_NAMEVARCHAR2INDEFAULT

P_SAMPLE_SIZENUMBERINDEFAULT

P_PREFIX_COLUMN1VARCHAR2INDEFAULT

mgmt_p_get_max_compress_order(p_table_owner => AE_MGMT

,p_table_name =>BIG_TABLE

,p_sample_size =>10000);

Running mgmt_p_get_max_compress_order... ---------------------------------------------------------------------------------------------------- Table: BIG_TABLE Sample Size: 10000 Unique Run ID: 25012006232119 ORDER BY Prefix: ---------------------------------------------------------------------------------------------------- Creating MASTER Table: TEMP_MASTER_25012006232119 Creating COLUMN Table 1: COL1 Creating COLUMN Table 2: COL2 Creating COLUMN Table 3: COL3 ---------------------------------------------------------------------------------------------------- The output below lists each column in the table and the number of blocks/rows and space used when the table data is ordered by only that column, or in the case where a prefix has been specified, where the table data is ordered by the prefix and then that column. From this one can determine if there is a specific ORDER BY which can be applied to to the data in order to maximise compression within the table whilst, in the case of a a prefix being present, ordering data as efficiently as possible for the most common access path(s). ---------------------------------------------------------------------------------------------------- NAMECOLUMNBLOCKSROWS SPACE_GB ============================== ============================== ============ ============ ======== TEMP_COL_001_25012006232119COL129010000 .0022 TEMP_COL_002_25012006232119COL234510000 .0026 TEMP_COL_003_25012006232119COL355510000 .0042 17. Pros & Cons

Saves space

Reduces LIO / PIO

Speeds up backup/recovery

Improves query response time

Transparent

To readers

and writers

Decreases time to perform some DML

Deletesshould bequicker

Bulk insertsmaybe quicker

18. Pros & Cons

Increases CPU load

Can only be used on Direct Path operations

Serial Inserts using INSERT /*+ APPEND */

Parallel Inserts (PDML)

ALTER TABLEMOVE

Direct Path SQL*Loader

Increases time to perform some DML

Bulk insertsmaybe slower

Updates are slower

19. Data Warehousing Specifics

Star Schema compresses better than Normalized

More redundant data

Focus on

Fact Tables and Summaries in Star Schema

Transaction tables in Normalized Schema

Performance Impact 1

Space Savings

Star schema: 67%

Normalized: 24%

Query Elapsed Times

Star schema: 16.5%

Normalized: 10%

1 -Table Compression in Oracle 9iR2: A Performance Analysis 20. Things To Watch Out For

DROP COLUMN is awkward

ORA-39726: Unsupported add/drop column operation on compressed tables

Uncompress the table and try again - still gives ORA-39726!

After UPDATEs data is uncompressed

Performance impact

Row migration

Use appropriate physical design settings

PCTFREE 0- pack each block

Large blocksize -reduce overhead / increase repeats per block

Minimise INITRANS -reduce overhead

Order data for best compression / access path

21. A Funny Thing

Block dump trace files still show 9iR2 even in 10g releases

ALTER SYSTEM DUMP DATAFILE x BLOCK y;

Thanks to Julian Dyke for the block dumping information http://www.juliandyke.com 22. What Is Partitioning ?

Partitioningaddresses key issues in supporting very large tables and indexes by letting you decompose them intosmallerand moremanageablepieces calledpartitions . Oracle Database Concepts Manual, 10gR2

Introduced in Oracle 8.0

Numerous improvements since

Subpartitioning adds another level of decomposition

Partitions and Subpartitions are logical containers

23. Partition To Tablespace Mapping

Partitions map to tablespaces

Partition can only be in One tablespace

Tablespace can hold many partitions

Highest granularity is One tablespace per partition

Lowest granularity is One tablespace for all the partitions

Tablespace volatility

Read / Write

Read Only

P_JAN_2005 P_FEB_2005 P_MAR_2005 P_APR_2005 P_MAY_2005 P_JUN_2005 P_JUL_2005 P_AUG_2005 P_SEP_2005 P_OCT_2005 P_NOV_2005 P_DEC_2005 T_Q1_2005 T_Q2_2005 T_Q3_2005 T_Q4_2005 T_Q1_2006 P_JAN_2006 P_FEB_2006 P_MAR_2006 T_Q3_2005 Read / Write Read Only 24. Read Only Tablespaces

Quicker checkpointing

Quicker backup

Quicker recovery

Reduced space use via compression

depends on granularity

Partition Tablespace 25. Why Partition ? - Performance

Improved query performance

Pruning or elimination

Partition wise joins

Partial

Selective Compression

By Partition

Selective Reorganisation

Index Partition REBUILD

Table Partition MOVE

SELECT SUM(sales)FROM part_tab WHERE sales_date BETWEEN 01-JAN-2005AND 30-JUN-2005 Sales Fact Table * Oracle 10gR2 Data Warehousing Manual JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC 26. Why Partition ? - Manageability

Archiving

Use a rolling window approach

ALTER TABLE ADD/SPLIT/DROP PARTITION

Easier ETL Processing

Build a new dataset in a staging table

Add indexes and constraints

Collect statistics

Then swap the staging table for a partition on the target

ALTER TABLEEXCHANGE PARTITION

Easier Maintenance

Table partition move, e.g. to compress data

Local Index partition rebuild

27. Why Partition ? - Scalability

Partition is generally consistent and predictable

Assuming an appropriate partitioning key is used

and data has an even distribution across the key

Read only approach

Scalable backups - read only tablespaces are ignored

so partitions in those tablespaces are ignored

Pruning allows consistent query performance

28. Why Partition ? - Availability

Offline data impact minimised

depending on granularity

Quicker recovery

Pruned data not missed

EXCHANGE PARTITION

Allows offline build

Quick swap over

P_JAN_2005 P_FEB_2005 P_MAR_2005 P_APR_2005 P_MAY_2005 P_JUN_2005 P_JUL_2005 P_AUG_2005 P_SEP_2005 P_OCT_2005 P_NOV_2005 P_DEC_2005 T_Q1_2005 T_Q2_2005 T_Q3_2005 T_Q4_2005 T_Q1_2006 P_JAN_2006 P_FEB_2006 P_MAR_2006 T_Q3_2005 Read / Write Read Only 29. Fact Table Partitioning Transaction Date Load Date

Easier ETL Processing

Each load deals with only 1 partition

No use to end user queries!

Cant prune Full scans!

Harder ETL Processing

But still uses EXCHANGE PARTITION

Useful to end user queries

Allows full pruning capability

07-JAN-2005 Customer 1 09-JAN-2005 15-JAN-2005 Customer 2 17-JAN-2005 January Partition February Partition 22-JAN-2005 Customer 3 01-FEB-2005 02-FEB-2005 Customer 4 05-FEB-2005 26-FEB-2005 Customer 5 28-FEB-2005 March Partition 06-MAR-2005 Customer 2 07-MAR-2005 12-MAR-2005 Customer 3 15-MAR-2005 Tran Date Customer Load Date April Partition 21-JAN-2005 Customer 7 04-APR-2005 09-APR-2005 Customer 9 10-APR-2005 07-JAN-2005 Customer 1 09-JAN-2005 15-JAN-2005 Customer 2 17-JAN-2005 21-JAN-2005 Customer 7 04-APR-2005 22-JAN-2005 Customer 3 01-FEB-2005 January Partition February Partition 02-FEB-2005 Customer 4 05-FEB-2005 26-FEB-2005 Customer 5 28-FEB-2005 March Partition 06-MAR-2005 Customer 2 07-MAR-2005 12-MAR-2005 Customer 3 15-MAR-2005 Tran Date Customer Load Date April Partition 09-APR-2005 Customer 9 10-APR-2005 30. Watch out for

Partition exchange and table statistics 1

Partition stats updated

but Global stats are NOT!

Affects queries accessing multiple partitions

Solution

Gather stats on staging table prior to EXCHANGE

Partition exchange

Gather stats on partitioned table using GLOBAL

Jonathan Lewis: Cost-Based Oracle Fundamentals, Chapter 2 31. Partitioning Feature: Characteristic Reason Matrix Partition Truncation Exchange Partition Archiving Pruning (Partition Elimination) Partition wise joins Parallel DML Local Indexes Read Only Partitions Availability Scalability Manageability Performance Characteristic: Feature: 32. Questions ? 33. References: Papers

Table Compression in Oracle 9iR2: A Performance Analysis

Table Compression in Oracle 9iR2: An Oracle White Paper

Scaling To Infinity, Partitioning In Oracle Data Warehouses, Tim Gorman

Decision Speed: Table Compression In Action

34. References: Online Presentation / Code

http://www.oramoss.demon.co.uk/presentations/stackitandpackit.ppt

http://www.oramoss.demon.co.uk/Code/mgmt_p_get_max_compression_order.prc

http://www.oramoss.demon.co.uk/Code/test_dml_performance_delete.sql

http://www.oramoss.demon.co.uk/Code/test_dml_performance_insert.sql

http://www.oramoss.demon.co.uk/Code/test_dml_performance_update.sql

http://www.oramoss.demon.co.uk/Code/test_block_size_compression.sql

http://www.oramoss.demon.co.uk/Code/test_column_length_compression.sql

http://www.oramoss.demon.co.uk/Code/test_itl_compression.sql

http://www.oramoss.demon.co.uk/Code/test_ndv_compression.sql

http://www.oramoss.demon.co.uk/Code/test_num_cols_compression.sql

http://www.oramoss.demon.co.uk/Code/test_pctfree_compression.sql

stack it and unpack it

Technology

table data

column table

compression benefits

maximum compression

compression savings

master table

compression v pctfree

compression v itl