Download - The Oracle9i Multi-Terabyte Data Warehouse
![Page 1: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/1.jpg)
![Page 2: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/2.jpg)
The Oracle9i Multi-Terabyte Data Warehouse
Jeff ParkerManager Data Warehouse Development
Amazon.com
Session id:
![Page 3: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/3.jpg)
The Challenges• Rapidly evolving business
• Growing data volumes
• Do more with less
![Page 4: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/4.jpg)
The Challenges• Rapidly evolving business
– New international markets
– Continual innovation of features on Amazon• Buy it used
• Magazine subscriptions
– Marketplace Partnerships – Toys R Us, Target
• Growing data volumes
• Do more with less
![Page 5: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/5.jpg)
The Challenges• Rapidly evolving
business
• Growing data volumes– 2X growth yearly over the
past 5 years
– Currently 10 Terabytes of raw data
• Do more with less
Data Growth
0
5
10
15
20
25
30
35
1999 2000 2001 2002 2003
Ter
abyt
es
![Page 6: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/6.jpg)
The Challenges• Rapidly evolving business
• Growing data volumes
• Do more with less– Innovative use of technology and resources
– Throwing money and people at the problem is not an option
– Leverage existing investment in Oracle
![Page 7: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/7.jpg)
Addressing the issues• Rapidly evolving business
–Denormalize only for performance reasons
–Create a solution that allows new datasets to be brought in rapidly to the DW, but without high maintenance costs
• Growing data volumes
• Do more with less
![Page 8: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/8.jpg)
Addressing the issues• Rapidly evolving business
• Growing data volumes– Dual database approach to ETL
• Staging database for efficient transformation of large datasets. SQL and hash-joins allow transforms to scale in a non-linear fashion
• Second database optimized for analytics
– Oracle as an API • Simplifies ETL architecture
• Better scalability than traditional ETL tools
• Do more with less
![Page 9: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/9.jpg)
Addressing the issues• Rapidly evolving business
• Growing data volumes
• Do more with less–One DW schema supports all countries
–Cut costs by eliminating unneeded software
–Data driven Load functionality
![Page 10: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/10.jpg)
The ETL Process
• Extract data from source
• The Load process
• Dimensional Transforms
DATAWAREHOUSE
RT
OLTP
RT
STAGINGDATABASE
DTRTFLATFILESEXTRACT
FLATFILES
RT
DT
= Row level data Transform
= Dimensional Transform
![Page 11: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/11.jpg)
The ETL Process
• Extract data from source–Can create one or more files to be loaded
–Must produce Metadata upon which the Load process can depend
• The Load Process
• Dimensional Transforms
![Page 12: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/12.jpg)
Extract produced Metadata
• Describes each field in database type terms
• Changes as the dataset changes
• Can reference multiple files
• Very reliable
• No additional overhead
![Page 13: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/13.jpg)
XML Based Metadata<DATA CHARSET="UTF8" DELIMITER="\t" ROWS=”1325987> <COLUMNS> <COLUMN ID="dataset_id" DATA_TYPE="NUMBER" DATA_PRECISION="38" DATA_SCALE="0“/> <COLUMN ID="dataset_name" DATA_TYPE="VARCHAR2"
DATA_LENGTH="80“/> <COLUMN ID="CREATION_DATE" DATA_TYPE="DATE"
DATE_MASK="YYYY/MM/DD.HH24:MI:SS“/> <COLUMN ID="CREATED_BY" DATA_TYPE="VARCHAR2"
DATA_LENGTH="8“/> </COLUMNS> <FILES> <FILE PATHNAME="/flat/datasets_20020923_US.txt.1“/> <FILE PATHNAME="/flat/datasets_20020923_US.txt.2“/> </FILES></DATA>
![Page 14: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/14.jpg)
The ETL Process
• Extract data from source
• The Load Process –Makes extensive use of External Tables
–MERGE and Bulk Insert
–Contains integrated DBA tasks
–Every load is tracked in an operational database
• Dimensional Transforms
![Page 15: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/15.jpg)
The Load Process
OPSDW Operational Data
Load Tasks
DBA Tasks
Log loadstats
Performcleanup
DataWarehouse
SQLInsert/Merge
Row levelTransforms
XT
ExternalTable
DataFiles
METADATA
DATAFILES
![Page 16: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/16.jpg)
The Load Process
• External Tables–access to files on the operating system
–Is a building block in a broader ETL process
• MERGE & Bulk Insert
• Integrated DBA tasks
![Page 17: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/17.jpg)
The External Table
• Created by using Metadata from the Extract process
• Data is read-only
• No indexes
• Use DBMS_STATS to set number of rows
DATA_SETSdataset_id NUMBERdataset_name VARCHAR(80)creation_date DATEcreated_by VARCHAR(8)
External Table
Data Files
![Page 18: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/18.jpg)
Example External Table
1. Copy the data to the database server Data must reside in a file system location
specified by the DBA’s.
- create directory DAT_DIR as ‘/stage/flat’
![Page 19: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/19.jpg)
Example External Table
2. Create the external table using the DML from the extract.
CREATE TABLE XT_datasets_77909( dataset_id NUMBER , dataset_name VARCHAR2(80) ,
creation_date DATE ,created_by VARCHAR2(8) ) ORGANIZATION EXTERNAL( TYPE ORACLE_LOADER
DEFAULT DIRECTORY dat_dir ACCESS PARAMETERS( records delimited by newline
characterset UTF8 fields terminated by '\t' LOCATION (‘/flat/datasets_20020923_US.txt' )
![Page 20: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/20.jpg)
The External Table
• No pre-staging of data
• Ability to describe a flat file to Oracle
• Handles horizontally partitioned files
• Good error messaging
![Page 21: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/21.jpg)
The Load Process
• External Tables
• MERGE–Can be run in parallel
–Combined with external table provides a powerful set of ETL tools
• Integrated DBA tasks
![Page 22: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/22.jpg)
MERGE
• Allows for update or insert in a single statement–If key value already exists
• Yes, update row
• No, insert row
• MERGE statement is auto-generated
• Row level column transforms are supported
![Page 23: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/23.jpg)
MERGE
External tableMetadata Permanent table
Dataset_id
Dataset_name
Created_by
last_updated
Dataset_id
Dataset_name
Created_by
sysdate
Dataset_id
Dataset_name
Created_by
![Page 24: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/24.jpg)
MERGE exampleMERGE into DATASETS dsUSING ( SELECT ds.dataset_name ,ds.creation_date ,nvl(created_by,’nobody’) as created_by ,sysdate as last_updated FROM XT_datasets_77909 xt ) srcOn ( xt.dataset_id = ds.dataset_id )When matched then UPDATE SET ds.dataset_name =
src.dataset_name ,ds.creation_date = src.dataset_name
,ds.created_by = src.created_by ,ds.last_updated = sysdate
when not matched then INSERT( dataset_name, creation_date, created_by, last_updated )
VALUES( dataset_name, creation_date, created_by, sysdate )
![Page 25: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/25.jpg)
MERGE
• Issues we faced–Duplicate records in the dataset
–NESTED-LOOPS from external table
–Parallelism is not enabled by default
–Bulk Load partition determination
![Page 26: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/26.jpg)
The Load Process
• External Tables
• MERGE
• Integrated DBA tasks–Reduces workload required by the DBA team
–Streamlines the load process
–Eliminates human error
![Page 27: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/27.jpg)
Integrated DBA Tasks
• Provided by the DBA team–Managed by the DBA team
–ETL team does not need special knowledge of table layout
![Page 28: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/28.jpg)
Integrated DBA Tasks
• Truncate Partitiondeveloper makes call truncate_partition( ‘TABLE-NAME’, partition-key1, partition-key2, partition-key3 )
DBA utility translates this and executesalter table TABLE-NAME drop partition dbi20020930_101;
![Page 29: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/29.jpg)
Integrated DBA Tasks
• Analyze Partitiondeveloper makes call analyze_partition( ‘TABLE-NAME’, partition-key1, partition-key2, partition-key3 )
DBA utility translates this and executes dbms_stats.gather_table_stats(ownname , tabname , partname , cascade , estimate_percent, granularity);
![Page 30: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/30.jpg)
Integrated DBA Tasks
• Return Partition Namedeveloper makes call get_partition_name( ‘TABLE-NAME’, partition-key1, partition-key2, partition-key3 )
DBA utility translates this and returns the appropriate name of the partition. This is very useful when bulk loading tables.
![Page 31: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/31.jpg)
Integrated DBA Tasks
• Partitioning utilities–Helps to streamline the process
–Reduces workload of DBA team
–Helps to eliminate the problem of double loads for Snapshot tables and partitions
![Page 32: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/32.jpg)
The Load Process
• External Tables– Provides access to flat files outside the database
• MERGE– Parallel “upsert” simplifies ETL
– Row level transforms can be performed in SQL
• Integrated DBA tasks– Reduces workload required by the DBA team
– Streamlines the load process
– Eliminates human error
• Loads are repeatable processes
![Page 33: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/33.jpg)
Summary
• Reduction in time to integrate new subject areas
• Oracle parallelism scales well
• Eliminated unneeded software
![Page 34: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/34.jpg)
Summary
• Oracle has delivered on the DW promise–Oracle External table combined with MERGE
is a viable alternative to other ETL tools
–ETL tools are ready today
![Page 35: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/35.jpg)
&Q U E S T I O N SQ U E S T I O N SA N S W E R SA N S W E R S
![Page 36: The Oracle9i Multi-Terabyte Data Warehouse](https://reader035.vdocuments.mx/reader035/viewer/2022062217/56813a30550346895da219d8/html5/thumbnails/36.jpg)
Reminder – please complete the OracleWorld session survey
Thank you.