oracle biwa sig basics worldwide association of 2000 professionals interested in oracle...
TRANSCRIPT
Oracle BIWA SIG Basics
• Worldwide association of 2000 professionals interested in Oracle Database-centric business intelligence, data warehousing, and analytical products, features and options.
• Membership is FREE• Open forum to foster success in use and development of
Oracle BIWA products. • BIWA’s goals include sharing: “best practices” and “novel
and interesting use cases” of Oracle BIWA-centric technology.
• See Mission Statement and Charter at oraclebiwa.org.
Next Oracle BIWA SIG Conference
BIWA Training Days at COLLABORATE 10 – IOUG Forum• “Get Analytical with BIWA Training Days”• April 18-22, 2010• Mandalay Bay Convention Center, Las Vegas, Nevada• REGISTER with Offer Code “BIWA2010”
for IOUG Member Discount Rate
• See oraclebiwa.org for detailsand links
<Insert Picture Here>
Top 5 Tips on:Reducing Storage Cost while Improving Performance
Jean-Pierre DijcksData Warehouse Product Management
Agenda
• Tip 1: Appropriate Hardware• Tip 2: Tier your Storage• Tip 3: Partition your Data• Tip 4: Compress your Data• Tip 5: Think, Plan and Design• Q&A
Agenda
• Tip 5: Think, Plan and Design• Tip 1: Appropriate Hardware• Tip 2: Tier your Storage• Tip 3: Partition your Data• Tip 4: Compress your Data• Software Forces a Paradigm Shift• Q&A
Tip #5: Think, Plan and Design
• Understand the requirements• Data retention rates• What to do with the older data• What are you doing with the newer data• Performance requirements for all data (not just the latest stuff)
• Plan for the worst (kind of)• What is the performance goal and can you still achieve that in 6
months or 2 years• What is the data retention rate and can you deal with this at double
your data size?
• Design the system to still work tomorrow
Tip #5: Think, Plan and Design
• Understand or learn about the trends
• Hardware• Low-price commodity servers• High refresh rates of components (CPUs etc.)• Ever growing sizes, speeds at ever dropping prices
• Software• More aligned with hardware• Push down into storage of data intensive tasks• Consolidation and more workloads are thrown at software
Tip #5: Think, Plan and Design
• An interpretation of the meaning of these trends is:
• We will see self-provisioning of vast resources by (end) users• This will be achieved by a flexible grid of resources being made
available• More people will get and use more compute power• More and more workloads are run on the “same” hardware• Integrated software services will provide the value add for these
users and make consolidation work…
This has major implications for all of us… I think…
Tip #1: Balance your HardwareDriver: Flexibility in Performance
• Solid State Disks
• Flash Cards and Disks
• Solid State Disks
• Flash Cards and Disks
• 2TB 10K RPM SATA Disks
• Other high capacity media
• 2TB 10K RPM SATA Disks
• Other high capacity media
ILMILMUpward
Downward
• Memory
• Etc…
CostCost SpeedSpeed
Higher
Lower
Tip #1: Balance your HardwareDriver: Flexibility in Performance
SATA drivesSATA drives
Flash TechnologyFlash Technology
MemoryMemory
SAS drivesSAS drives
Off-line Data ArchivesOff-line Data Archives
< 10% of your data
< 50% of your data
100% of your data
Disclaimer: Illustration purposes only!
60% of yourqueries
35% of yourqueries
5% of your queries
Tip #2: Tier your StorageDriver: Cost and Performance
SpeedandCost
SATA drivesSATA drives
Flash TechnologyFlash Technology
MemoryMemory
SAS drivesSAS drives
% of capacity
Performance
0
10
5
85
% of capacity
Capacity
99.75
0
0.25
0
Disclaimer: Illustration purposes only!
=> cost indicator
Tip #2: Tier your StorageDriver: Cost and Performance
SpeedandCost
SATA drivesSATA drives
Flash TechnologyFlash Technology
MemoryMemory
SAS drivesSAS drives
% of capacity
Balanced Perform
0
5
1
94
Disclaimer: Illustration purposes only!
=> cost indicator% of capacity
Balanced Capacity
98
1.5
0.5
0
Tip #1 and Tip #2Balance and Flexibility
• Create a grid of compute and storage resources• Allow for a hierarchy of storage solutions within the grid• Balance the hardware to:
• Achieve acceptable performance for the majority workload• Achieve great performance for mission critical actions• Achieve a reasonable price / performance balance
• Do not size just for performance, nor just for capacity
Very diverse workloads + same hardware = need for flexibility
Very diverse workloads + same hardware = need for flexibility
Tip #3: Partition your DataImpact: Performance and Ease of Maintenance
• Maintenance:• Easier to work on smaller chunks of data• Allows specification of separate management and performance
strategies on a smaller chunk
• Performance:• In maintenance operations (as shown above)• By reducing the data volume to scan • A potential way of allowing parallel operations to optimize data
processing
The Concept of PartitioningSimple yet powerful
Large TableDifficult to Manage
PartitionDivide and Conquer
Easier to Manage
Improve Performance
Composite PartitionHigher Performance
More flexibility to match business needs
SALES SALES
Jan Feb
SALES
Jan Feb
Europe
USA
Q: What was the total sales amount for May 20
and May 21 2009?
Select sum(sales_amount)
From SALES
Where sales_date between
to_date(‘05/20/2009’,’MM/DD/YYYY’)
And
to_date(‘05/22/2009’,’MM/DD/YYYY’);
Sales Table
5/17
5/18
5/19
5/20
5/21
5/22
Only the 2 relevant partitions are read
Partition for PerformancePartition Pruning
Both tables have the same degree of parallelism and are partitioned the same way on the join column (cust_id)
Range Range partition May partition May 1818thth 2008 2008
Sub part 2Sub part 2
Sub part 3Sub part 3
Sub part 4Sub part 4
Sub part 1Sub part 1
Parallel ProcessingPartition Wise Join
CustomerCustomerSalesSales
Sub part 2Sub part 2
Sub part 3Sub part 3
Sub part 4Sub part 4
Sub part 1Sub part 1
Sub part 2Sub part 2
Sub part 3Sub part 3
Sub part 4Sub part 4
Sub part 1Sub part 1
A large join is divided into multiple smaller joins, each joins a pair of partitions in parallel
Select sum (sales_amount)
From Sales s, Customer c
Where s.cust_id = c.cust_id;
Sub part 2Sub part 2
Sub part 3Sub part 3
Sub part 4Sub part 4
Sub part 1Sub part 1
18
Order_date Ship_date Cust_ID
Prod_ID
Amount
03-SEP-2009 19-SEP-2009 10075 32932 10,000.00
03-SEP-2009 05-SEP-2009 20098 20098 20,000.00
03-SEP-2009 07-OCT-2009 10089 20010 15,000.00
03-SEP-2009 01-OCT-2009 20100 10000 35,000.00
03-SEP-2009 19-OCT-2009 80300 30000 10,000.00
03-SEP-2009 03-NOV-2009 10000 2030 40,000.00
Exadata Storage IndexTransparent I/O Elimination with No Overhead
• Exadata Storage Indexes maintains summary information about table data in memory• Stores MIN and MAX values of filter columns• Typically one index entry for every MB of disk
• Eliminates disk I/Os if MIN and MAX can never match “where” clause of a query• “Negative index”
• Completely automatic and transparent
MIN ship_date = ’01-OCT-2009’MAX ship_date = ’03-NOV-2009’
Select * from orders where ship_date < ’30-SEP-2009’Only first set of rows can match
MIN ship_date = ’19-SEP-2009’MAX ship_date = ’07-OCT-2009’
Tip #3: Partition your DataImpact: Performance and Ease of Maintenance
• This does not mean “Apply Partitioning”, there is more that Oracle can do to allow better performance:
• Partitioning – of course… To improve scan speeds and maintenance operations
• Storage Indexes – To improve Scan Speeds• Smart Scans – To reduce data moved around
Breaking up a large data set delivers both performance and ease of maintenance
Breaking up a large data set delivers both performance and ease of maintenance
Tip #3: Partition your DataDriver: Flexibility in Performance
SATA drivesSATA drives
SAS drivesSAS drives
Off-line Data ArchivesOff-line Data Archives
< 75% of your data
100% of your data
Disclaimer: Illustration purposes only!
95% of yourqueries
5% of your queries
Improve scan rates to leverage slower storage tiers = Downward Mobility
Improve scan rates to leverage slower storage tiers = Downward Mobility
Tip #4: Compress your DataImpact: Cost and Performance
Compression in Oracle:
• “Data Warehouse” compression• 2 – 3x compression ratio• Included in DB license
• “OLTP” compression• 3 – 4x compression ratio• Database Option – Advanced Compression
• Exadata Hybrid Columnar Compression• 10 – 50x compression ratio• Included in Oracle Exadata license
Tip #4: Compress your DataHybrid Columnar Compression
• Data is grouped by columnand then compressed
• Query Mode for data warehousing
• Optimized for speed
• 10X compression typical • Scans improve proportionally
• Archival Mode for infrequently accessed data
• Optimized to reduce space• 15X compression is typical• Up to 50X for some data
Tip #4: Compress your DataUsage Matrix
• Apply compression on a per Partition or higher level• Change compression over the lifetime of a Partition or
Table• Both EHCC and DW compression will start to
“decompress” when data is updated
Workload Preferred Possible
Bulk Load(write once, read many times)
EHCC – Query DW
Operational DW(write, update, delete, update, read)
OLTP ---
Archive(static data, read once in a while)
EHCC – Archive DW
EHCCArchive Mode
EHCCQuery Mode
OLTP Compression
Tip #4: Compress your DataApplying Compression across Partitions
Day 1 Day 2 Day 8 Day 9 Day 10 Month 7 Month 8
Tip #4: Compress your Data Improving performance and lowering cost
SATA drivesSATA drives
Flash TechnologyFlash Technology
MemoryMemory
SAS drivesSAS drives
< 30% of your data
100% of your data
Disclaimer: Illustration purposes only!
75% of yourqueries
25% of yourqueries
Move more data onto high performance storage tiers = Upward Mobility
Move more data onto high performance storage tiers = Upward Mobility
Software forces a Paradigm ShiftApplying software changes the balance
• Adding software into the mix fundamentally changes the way we use and think about storage
• Software driven partitioning of data changes the cost per scanned TB in relation to total data volume
• Compression changes the cost per TB stored significantly
• Compression changes the cost per scanned TB in relation to non-compressed data
Compounding ReturnsLess Storage, Better Performance
1 TBwith compression
10 TB of user dataRequires 10 TB of IO
100 GBwith partition pruning
20 GB with Storage Indexes
5 GB with Smart Scans
Sub-second Response times
ConclusionSoftware and Hardware is the Solution
• Building a hierarchy of storage solutions allows you to be:• More Flexible• Deliver better performance for lower cost
• Partitioning and compression are technologies that change the hardware status-quo• Partitioning allows slower HW deliver better performance• Compression allows faster HW to hold more data and delivers
better performance
With today’s technology you can improve performance while reducing storage costWith today’s technology you can improve performance while reducing storage cost
AQ&