data warehousing with - download.oracle.com with oracle 10g.pdforacle warehouse builder •...
TRANSCRIPT
-
Data Warehousing with Oracle Database 10g
-
22
Boris GurovSenior Sales ConsultantOracle ECE Ltd.Branch Bulgaria
-
Agenda
• Oracle Warehouse Builder• Oracle Database 10g for Data Warehousing
• Ensure a well-tuned I/O subsystem• Find a schema balance• Init.ora settings
• Summary & Close
-
Oracle Warehouse BuilderData structure designData structure design
Capture sourceCapture sourcedefinitionsdefinitions
ETL designETL design
Generate codeGenerate code
DeployDeploy
Extract and Transform DataExtract and Transform DataExtract and Transform Data
Validate designValidate design
-
Source Support
• Relational:• Oracle• IBM DB2• SQL Server• Sybase• etc. including ODBC
• Files• Applications
• SAP
-
Target Support
• Oracle 9.2 (tables and queues)• Oracle 10g (tables and queues)• Flat files (Oracle database is transformation
engine)
-
Design
• Data structures:• Dimensional• Relational
• ETL processes:• Data flows• Process flows
• End user access
-
Dimensional Design
-
ETL Design (Data Flows)
-
ETL Design (Process Flows)
-
Code Generation
• Data Definition Language (DDL)• OLAP metadata creation• Optimized PL/SQL code• SQL Loader control files• ABAP code• XML Process Definition Language (XPDL)
-
Data Quality in OWB
• Data Quality functionalities are integrated into ETL processes
• Disciplined approach to Data Quality, not an afterthought
• Data Quality is modeled, executed and audited just like any other transformation
• Currently consists of• Name and Address Cleansing• Match-Merge
-
Name and Address Cleansing• Transformations are:
• Parsing• Standardization• Correction• Augmentation
-
Oracle Warehouse Builder
• Enterprise Business Intelligence integration design tool that manages the full lifecycle of data and metadata for Oracle database(s)
-
Agenda
• Oracle Warehouse Builder• Oracle Database 10g for Data Warehousing
• Ensure a well-tuned I/O subsystem• Find a schema balance• Init.ora settings
• Summary & Close
-
Data Warehousing Applications
• Even after decades of innovation, a computer ‘still’consists of three main components
– CPU provides the computing power– Memory stores the transient data for computational operations– Disks (I/O) store the persistent information
• Getting the best performance is finding the right balance of all these components and use them optimally
– Size your system appropriately– Design your database appropriately– Use the database appropriately
-
Evolution of a 180 GB database
1993
1 GB disk2 MB/sec
50 IO/sec20 ms seek
3.600 rpm
2000
72 GB disk40 MB/sec
160 IO/sec6 ms seek
10.000 rpm
2002
180 GB disk30 MB/sec
120 IO/sec8 ms seek
7.500 rpm
180 disks 50 disks 3 disks 1 disk360 MB/sec 300 MB/sec 120 MB/sec 30 MB/sec
9000 IO/sec 3500 IO/sec 480 IO/sec 120 IO/sec
Note: equivalent I/O Bus is necessary
Single disk
DB system
180 GB
1996
4 GB disk6 MB/sec
70 IO/sec14 ms seek
7.200 rpm
I/O - The ‘disk dilemma‘
-
I/O – Unlimited Scalability
• Use parallelism to enable single process scalability
• Unrestricted parallelism• No data layout requirement or restriction (shared
nothing systems) • All operations can be parallelized
����������� ���������
scan
scan
scan
sort A-K
sort L-S
sort T-Z
Dispatch work
������� �����������������
���������
Parallel Execution
-
DOP 2
DOP 2
DOP 4
DOP 4
DOP 4
DOP 4
DOP 8
DOP 8
DOP 8
DOP 8
DOP 8
DOP 8
DOP 8
DOP 8
DOP 8
DOP 8
DOP 8
DOP 8
DOP 8
DOP 8
DOP 8
DOP 8
I/O – Parallel Execution
• I/O bandwidth requirement increases with single process parallelism and multi-user concurrency
• Plan for your system’s expected I/O throughput based on average concurrent users and parallelism
-
• Oracle can read 300+MB/sec per GHz/CPU power
• Direct Read, multi-block IO, e.g, parallel full table scan
• An ‘average’ DW system should plan for a minimum of 100MB/sec per GHz/CPU
• Typical mixture of IO and CPU intensive operations
• Ball park number, adjust accordingly
I/O – Unlimited ScalabilitySizing Guidelines
-
• Partition Pruning• Only touch the relevant data
• Star transformation• Bitmap index access instead of full data access
• Materialized Views• ‘Index’ your business questions, not only the data
• Table Compression• Store data more efficiently
• Prioritize requests accordingly
Sample Oracle FunctionalityI/O – Minimize Requests
-
04-May
04-Apr
04-Feb
04-Jan
04-Mar
04-Jun
Sales
SELECT sum(sales_amount) FROM salesWHERE sales_date BETWEEN ‘01-MAR-2004’ AND ‘31-MAY-2004’;
I/O – Partition PruningPartition Pruning
• Only relevant partitions will be accessed• Static pruning with known values in
advance• Dynamic pruning uses internal recursive
SQL to find the relevant partitions
• Minimizes I/O operations• Provides massive performance gains
-
I/O – Bitmap IndexingBitmap Indexes
• Bitmap Indexes are usually 3 to 20 times smaller than b-tree indexes
• Patented compressed storage• Ideal for set-based operations
• Star transformation uses bitmap indexes to identify base table records of interest
• Replaces full table access with bitmap index access
• Minimizes necessary I/O
-
Average SalesAverage Salesby Regionby Region
Quarterly Salesby ProductMonthly Sales
by Region
Query
What were the sales in the West and South regionsfor the past three Quarters?
DetailDetail
QueryRewrite
Monthly Salesby Region
I/O – Materialized ViewsMaterialized Views
A simple rollup Month -> Quarterprovides unprecedented gain on performance and minimal I/O
-
I/O – Table Compression
�Tables can be compressed –Compression can also be specified at the partition level
–Indexes are not compressed
�Typical compression ratios range from 3:1 to 5:1–Compression is dependent upon the actual data
–Compression algorithm based on removing data redundancy
�Key benefit is cost savings–Save TB’s of storage without compromising performance or functionality
-
Resource SchedulerResource Scheduler
High PriorityHigh Priority
Medium PriorityMedium Priority
Low PriorityLow Priority
Sales Analysis
AdHoc Reports
ETL Jobs
20 users DOP 10
I/O – Prioritize ResourcesDatabase Resource Manager• Protect the system proactively
• Maximum number of concurrent operations• Maximum degree of parallelism for a given priority group
• Subset of Database Resource Manager functionality
200 users DOP 4
5 users DOP 6
-
Schema – which way to go?
� Innovative use of bitmap indexes and bitmap join indexes
• Support for Complex Star Schemas– Large numbers of dimensions– Multiple fact tables– Snowflake schemas
• Sophisticated partition pruning• Parallel execution
Star Schema
-
Schema – which way to go?
CUSTOMER_ORDERS CUSTOMER_ORDER_PRODUCTS
............
Jan
Feb
Mar
Apr
............
Jan
Feb
Mar
Apr
Both tables are partitioned by composite range-hash. The tables are partitioned on ORDER_DATE for the range dimension and CUSTOMER_ID for the hash dimension.
-
Schema – which way to go?
CUSTOMER_ORDERS CUSTOMER_ORDER_PRODUCTS
............
Jan
Feb
Mar
Apr
............
Jan
Feb
Mar
Apr
Suppose a query examines the orders and products for January and February. First, Oracle can do partition-pruning with the range partitions.
-
Schema – which way to go?
CUSTOMER_ORDERS CUSTOMER_ORDER_PRODUCTS
............
Jan
Feb
Mar
Apr
............
Jan
Feb
Mar
Apr
Jan
Jan
Second, Oracle will do a partition-wise join on the range partitions.
-
Schema – which way to go?
CUSTOMER_ORDERS CUSTOMER_ORDER_PRODUCTS
............
Jan
Feb
Mar
Apr
............
Jan
Feb
Mar
Apr
Jan
Jan
Jan, Hash 1
Jan, Hash 2
Jan, Hash 3
Jan, Hash 4
Third, Oracle will do a partition-wise join on the hash sub-partitions.
-
• Star schema• Range-partition fact tables by time• Bitmap indexes on dimension-key columns of fact table• ‘Star transformation’ for end-user queries
• 3NF or normalized schema• Composite range-hash partitioning on large tables• ‘Partition-wise’ joins and parallel execution are key performance
enabler for joining large tables
• Hybrid environments• Use both dogmas concurrently in the same system without
affecting each other
Schema – which way to go?
Choose what fits your needs best!Oracle provides optimizations for any kind of setup
Oracle‘s functionality
-
Init.ora Settings
• Do not de-tune Oracle• Very often, our performance engineers are getting
improvements just by removing parameters• Results can be poor optimizer plans, wasted
memory, and serialization points• Trust Oracle
• Don’t try and second guess the software• With the exception of buffer and subject area related
parameters, the system defaults are usually optimum
Lessons learned from History
-
Init.ora – less is more
• Ensure that data warehouse relevant parameters are set
• Not all parameters are enabled by default in older database releases prior to Oracle10g
• Size and set buffer and memory related parameters
• Two parameters are enough
• Do not touch other parameters unless necessary
Basic Rules
-
Init.ora – less is more
0
1,000
2,000
3,000
4,000
5,000
6,000
1 2 4 6 8 10 12 16 20
Users
Mem
ory
Usa
ge (
MB
)
7.5 MB15 MBAuto
-
Summary
• Data Warehousing• ‘just a special kind of application’
• Size for I/O throughput• not for disk capacity
• Design according your needs using the appropriate model, not the other way around
• Init.ora settings• Less is more
-
For More Information
and
oracle.com/datawarehousing
otn.oracle.com/datawarehousing