Download - Informix warehouse and accelerator overview
1
Informix Warehouse & Informix Warehouse Accelerator
Overview
• Scripted for Tech Sales audience
• March 2011
2
Disclaimer– © Copyright IBM Corporation 2011. All rights reserved.
– U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
– THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, NOR SHALL HAVE THE EFFECT OF, CREATING ANY WARRANTIES OR REPRESENTATIONS FROM IBM (OR ITS SUPPLIERS OR LICENSORS), OR ALTERING THE TERMS AND CONDITIONS OF ANY AGREEMENT OR LICENSE GOVERNING THE USE OF IBM PRODUCTS AND/OR SOFTWARE.
• IBM, the IBM logo, ibm.com, Cognos, SPSS and Informix are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml
• Other company, product, or service names may be trademarks or service marks of others.
3
Agenda
• Data Warehouse Industry Trends
• Data Warehousing on Informix
– History & Roadmap
• Informix Data Warehouse
– Informix Warehouse Tooling – ETL
– IDS 11.70 Server Features
• Informix Warehouse Accelerator
• Q&A
4
Data Warehousing Industry Trends
5
State of Data Warehousing in 2011
DBMS Market in 2011:
• DBMS market at the close of 2009 was approximately $21.2 billion (2010 data not yet available)
• Data Warehouse DBMS market was approximately 35% of the DBMS market or $7.42 billion
Key Findings:
• Data warehouse DBMSs have evolved to a broader analytics infrastructure supporting operational analytics, corporate performance management and other new applications and uses.
• Cost is driving interest in alternative architectures but performance optimization is driving multi-tiered data architectures and a variety of deployment options - notably a strong interest in in-memory data mart deployments.
6
State of Data Warehousing, Cont’d
Market Dynamics for 2011
• Today, smaller data warehouses, those less than 5 TB's of source system extracted data (SSED) are the only "data warehouse" for the entire organization and are commonly solving organizations' analytic needs. Gartner estimates that between 70% and 75% of all systems referred to as EDW are actually single business departments in nature.
Analysis:
• Optimization techniques such as summaries, aggregates and indexes are simply the result of performance restrictions inherent to normalized data and the way the RDBMS manages rows and columns.
7
State of Data Warehouse, Cont’d
A Glimpse Into the Future
• Vendor solutions began to focus even more on the ability to isolate and prioritize workload types including strategies for dual warehouse deployments and mixing OLTP and OLAP on the same platform.
• In-memory DBMS solutions provide a technology which enables OLTP/OLAP combined solutions. Organizations should increase their emphasis on financial viability during 2011 and even into 2012 as well as aligning their analytics strategies with vendor road maps when choosing a solution.
8
Data Warehouse Trends for the CIO, 2011-2012
Data Warehouse Appliances:
• DW appliances are not a new concept. Most vendors have developed an appliance offering or promote certified configurations. Main reason for consideration is simplicity.
The Resurgence of Data Marts:
• Data marts can be used to optimize DW by offloading part of the workload, returning greater performance to the warehousing environment
Column-Store DBMSs
• CIOs should be aware that their current DBMS vendor may offer a column-store solution. Don’t just buy a column-store-only DBMS because a column store was recommended by your team.
In-Memory DBMSs
• IMDBMS technology also introduces a higher probability that analytics and transactional systems can share the same database.
9
Informix Warehouse History
Informix has 3 Database Products: XPS for MPP Data Warehousing Red Brick for Star Schema data marts/data warehousing Informix Dynamic Server (IDS) for OLTP & (now) Data
Warehousing
10
Existing IDS Warehousing Features
• Performance & Scalability
– Inherent SMP Multi-threading
– Parallel Data Query (PDQ)
– Light Scan for fast table scans
– Online Index build
– Efficient Hash Joins
– Auto Fragment Elimination
– Memory Grant Manager (MGM)
– High Performance Loader
– Optimistic Concurrency
• Easy of Management
– Time cyclic data management using Range Partitioning
– Sophisticated Query Optimizer for OLTP and Warehousing
11
Informix Warehousing Moving Forward
Goal is to provide a comprehensive warehousing platform that is highly competitive in the marketplace
Incorporating the best features of XPS and Red Brick into IDS for OLTP/Warehousing and Mixed-Workload
Using the latest Informix technology in:
Continuous Availability and Flexible Grid
Data Warehouse Accelerator using latest industry technology
Integration of IBM’s BI software stack
12
Informix Warehouse Feature
- SQW- Data Modeling- ELT/ETL
Informix Warehouse with Storage Optimization/Compression
Cognos integration- Native Content Store on IDS
SQL Merge
Informix Warehouse Roadmap
External Tables
Star Join OptimizationMulti-index ScanNew FragmentationFragment Level StatsStorage Provisioning
Warehouse Accelerator
13
Informix Warehouse11.70 Features
14
Typical Data Warehouse Architecture
15
Source: Forrester
Query Tools
Analytics
BPS Apps
BI Apps
LOB apps
Databases
Other transactional data sources
I/O & data loading
Query processing
DBMS & Storage mgmt
11.70 Warehousing Features
Data Loading
HPL
DB utilities
ON utilities
DataStage
External Tables
Online attach/detach
Data & Storage Management
Deep Compression
Interval and List Fragmentation
Online attach/detach
Fragment level stats
Storage provisioning
Table defragmenter
Query Processing
Light Scans
Merge
Hierarchical Queries
Multi-Index Scan
Skip Scan
Bitmap Technology
Star and Snowflake join optimization
Implicit PDQ
Access performance
16
SQW
Control DB
IDS
Execution
DESIGN
Design Center(Eclipse)
Data Flows + Control Flows
DEPLOY
Deployment
preparation
Deployment package
Code Units
Build Profile
User scripts
Deploy
RUNTIME
HTTP service (WAS )SQW Runtime
ApplicationsOther Servers
(DataStage)
Warehouse
DB
IDSDB2
Oracle
SQL Server
De
sig
n S
tud
ioA
dm
in
Co
ns
ole
Deploy
SQWExecution
DB
IDS
Data Source
Datab
ases
Exec
utio
n
Execution
Debug
Informix Warehouse Tooling - SQW
17
SQW: Design Studio
• Design Studio–Eclipse based IDE
• Integrated tools, shell sharing
–Team development
• CVS, clearcase for checkin/checkout projects, flows
• Data Warehousing Project–Data Models–Data Flows–Control Flows–Warehouse Applications (deployment packages)–Subflow & Subprocess (reusable flow module)–Variables
• Data Source Explorer–Database connections to multiple vendors, e.g. Informix, DB2
LUW, Oracle, SQL Server, MySQL, DB2 z/OS
• DataStage Servers–Integration with IBM DataStage
18
SQW: Data Modeling
Physical Data Model Visualized data modeling Impact analysis Reverse engineering or new from scratch Compare & sync Generate DDL Overview diagram
Shell Sharing with Rational Data Architect & other Data Studio products
Physical Data Model Visualized data modeling Impact analysis Reverse engineering or new from scratch Compare & sync Generate DDL Overview diagram
Shell Sharing with Rational Data Architect & other Data Studio products
19
SQW: Data Flows
Data Flow Operators:
Source & target operators (table, file)
SQL Transformation operators
Warehousing operators
Data Flow Operators:
Source & target operators (table, file)
SQL Transformation operators
Warehousing operators
File source
Table source
Table join
aggregation
Table target
20
SQW: Data Flows
A simple flowA simple flow
Generated SQL code
Optimization across SQL statements.
Optimized staging strategy
In-database transformation
Generated SQL code
Optimization across SQL statements.
Optimized staging strategy
In-database transformation
21
SQW: Control Flows
Control flow
Common utility operators
Control logic, parallel execution, loop iteration
Error handling
Control flow
Common utility operators
Control logic, parallel execution, loop iteration
Error handling
22
SQW Overview
Design Studio
Eclipse Based Design Environment
Admin Console
Production Environment in Websphere
deploy
Application package (zip file)
Deployment profile: database connections, machine resources, variable definitions, DDL files etc..
Generated code
create
Manage warehouse applications
Schedule
Monitor
man
age
23
Admin Console Flex RIA based Warehouse Admin Console
Admin Console manages common resources (e.g. databases connections, ftp servers, DataStage servers)
Schedule & monitor warehouse processes
24
• Time-cyclic data management (roll-on, roll-off)
• Attach and detach online without requiring exclusive lock and access to the table
• Automatically kicks off background process to recollect statistics.
• Interval and List Fragmentation
• Auto Fragment level statisticsfieldfieldfieldfieldfieldfieldfield
fieldfieldfieldfieldfieldfieldfield field
fieldfieldfieldfieldfieldfield
fieldfieldfieldfieldfieldfieldfield
fieldfieldfieldfieldfieldfieldfield
fieldfieldfieldfieldfieldfieldfield
fieldfieldfieldfieldfieldfieldfield
JanJan FebFeb MarMar AprApr
May 2011May 2011
Dec 2010Dec 2010
enables storing data over time
Informix 11.70 Feature: Warehouse Time-Cyclic Data Management
25
Interval Fragmentation
• Fragments data based on an interval value
– E.g. fragment for every month or every million customer records
• Tables have an initial set of fragments defined by a range expression
• When a row is inserted that does not fit in the initial range fragments, IDS will automatically create fragment to hold the row (no DBA intervention)
• No Exclusive-lock is required for fragment addition
• All the benefits of fragment by expression
26
Informix 11.70 Feature: Multi-Index Scan
• Make use of all available indices
• Use set operations to apply to all rowids
• Use bitmap operations like union and intersection
• Bitmap can also be used for Skip Scan operations
27
Multi-Index Scan – An Example
• Handling common Data Warehouse queries more efficiently
• Large dimension tables, e.g. customer table
• Multiple low-selectivity attributes like gender, age group, zip code, etc.
• Example
SELECT count (customer_id)
FROM customer_table
WHERE gender = male
AND income_category = HIGH
AND education_level = MASTERS
AND zip_code = 95032;
28
Multi-Index Scan Example
• Method #1:
– Evaluates the most selective constraint
– Generates a list of rows that qualify, and
– Evaluate the remaining constraints for each of the rows generated above which will produce the answer to the query
Method retrieves rows based on the most selective constraint using only the index for that column, followed by a sequential evaluation of each of other constraints in a post-retrieval manner.
29
Multi-Index Scan Example
• Method #2– Evaluate each constraint by using a different B-tree index on each attribute –
results in a list of rows that qualify for each constraints.– Merge the lists to form one master list that satisfies all the constraints– Retrieve the qualifying rows to produce the answers
Gender=‘m’ Zipcode=‘95032’
AND
Sequential Skip Scan
RecordsSorted RIDs
Income_Category=“high”
Education_level = “masters”
30
Informix 11.70 Feature: Push Down Hash Join
First, a standard Hash Join for typical warehousing queries
involving a “large” Fact table with multiple dimension tables
Build Hash Table on Left Input
Probe with Right Input
Typically, build on smaller input
avoids hash table overflow to disk
Build Scan
Hash Join
Build Probe
Probe Scan
31
Large Central “Fact” table
Smaller “Dimension” tables
Restrictions on Dimension tables
assume independence
Small fraction of Fact table in result
Dim (D1) Dim (D3)Fact (F)
1M rows
sel :1/1000
10K rows sel : 1/10
10K rows
sel : 1/10
10K rows
sel: 1/10
Dim (D2)
Typical Star Schema: An Example
32
Scan D1
Hash Join
1K
Hash Join
Scan D3Hash Join
Scan D2
1M
100K 1K
10K 1KProblem Join
Second Join Build Too Large
Scan F
Prior to 11.70: Standard Left Deep Tree Solution
33
Scan F
Hash Join
Scan D1
1K
Hash Join
Scan D3 Hash Join
Scan D2
1K
1K1K
1K1K
Join Keys
Multi Index Scanof Fact Table
using Join Keys and Single-Column Indexes
Join Keys Pushed Down to Reduce Probe Size
11.70 Feature: Pushdown Hash-Join Solution
34
Informix Warehouse Accelerator (IWA)
35
Agenda
• 3rd Generation Data Base Technology
• Overview of the Informix Warehouse Accelerator (IWA)
• Target Market
• Beta Customer Experience
• IWA vs. Row/Column/Hybrid Stores
• Loading IWA
• Referenced Hardware & Software Configuration
36
Third Generation of Database Technology
According to IDC’s Article (Carl Olofson) – Feb. 2010
1st Generation:
- Vendor proprietary databases of IMS, IDMS, Datacom
2nd Generation:
- RDBMS for Open Systems, dependent on disk layout, limitations in scalability and disk I/O
- Database tuning by adding updating stats, creating/dropping indexes, data partitioning, summary tables & cubes, force query plans, resource governing
3rd Generation: IDC Predicts that within 5 years:
• Most data warehouses will be stored in a columnar fashion
• Most OLTP database will either be augmented by an in-memory database (IMDB) or reside entirely in memory
• Most large-scale database servers will achieve horizontal scalability through clustering
37
Example of 2nd Generation Database Disk I/O Issue
38
How Oracle/Exadata Solves That Problem:Add an I/O Layer
39
Sun Oracle Database Machine Full Rack
• Each Exadata cell is a self-contained server which houses disk storage and runs the Exadata software
• Databases are deployed across multiple Exadata cells
• Database enhanced to work in cooperation with Exadata intelligent storage
8 Cores
24 GB Memory
12 Disks
(600 GB/2 TB)
8 Cores
24 GB Memory
12 Disks
(600 GB/2 TB)
8 Cores
24 GB Memory
12 Disks
(600 GB/2 TB)
8 Cores
24 GB Memory
12 Disks
(600 GB/2 TB)
8 Cores
24 GB Memory
12 Disks
(600 GB/2 TB)
14 Exadata Storage Cells (Storage Server)
per Cell up to 1.5 GB/Sec I/O Bandwidth => 21 GB/Sec per DB machine
8 Cores
72 GB Memory
8 Cores
72 GB Memory
8 Cores
72 GB Memory
8 Cores
72 GB Memory
8 Cores
72 GB Memory
8 Oracle RAC Database Servers
InfiniBand Switches/Network
InfiniBand 16 Gigabit per Channel
40
Cost of Oracle/Exadata Solution
• Database Machine price – Full Rack
$1,115,000 Hardware (same price for 600GB or 2TB drives)
$1,680,000 Oracle Exadata Storage Server software
$1,520,000 Oracle 11gR2 Enterprise Edition
$736,000 Oracle Real Application Clusters
$368,000 Oracle Partitioning
$368,000 Advanced Compression
$160,000 Enterprise Manager Diagnostic Pack (recommended)
$160,000 Enterprise Manager Tuning Pack (recommended)
$1,098,240 1st year software support and maintenance
---------------------------------------------------------------------------------------------------------
$7,240,240 Total Price
• Excludes OLAP option, Data Mining option, ETL option
• Installation is extra and requires a custom quote
41
Agenda
• 3rd Generation Data Base Technology
• Overview of the Informix Warehouse Accelerator (IWA)
• Target Market
• Beta Customer Experience
• IWA vs. Row/Column/Hybrid Stores
• Loading IWA
• Referenced Hardware & Software Configuration
42
Informix Warehouse Accelerator 3rd Generation Database Technology is Here
How is it different?• Performance: Unprecedented response
times to enable 'train of thought' analysis frequently blocked by poor query performance.
• Integration: Connects to IDS through deep integration providing transparency to all applications.
• Self-managed workloads: queries are executed in the most efficient way
• Transparency: applications connected to IDS, are entirely unaware of IWA
• Simplified administration: appliance-like hands-free operations, eliminating many database tuning tasks
What is it?
The Informix Warehouse Accelerator (IWA) is a workload optimized, appliance-like, add-on, that enables the integration of business insights into operational processes to drive winning strategies. It accelerates select queries, with unprecedented response times.
Breakthrough Technology Enabling New Opportunities
43
Breakthrough technologies for performance
1
2
34
5
6
7 1
2
34
5
6
7
Row & Columnar DatabaseRow format within IDS for transactional workloads
and columnar data access via accelerator for OLAP queries.
Extreme CompressionRequired because RAM is the limiting factor.
Massive ParallelismAll cores are used within used for queries
Predicate evaluation on compressed data
Often scans w/o decompression during evaluation
Frequency PartitioningEnabler for the effective parallel access of
the compressed data for scanning. Horizontal and Vertical Partition
Elimination.
In Memory Database3rd generation database technology avoids I/O. Compression allows huge databases
to be completely memory resident
Multi-core and Vector Optimized Algorithms
Avoiding locking or synchronization
44
TC
P/IP
Informix Warehouse Accelerator Configuration
IDS: • Routes SQL queries to accelerator
• User need not change SQL or apps.
• Can always run query in IDS, e.g., if
– too short an est. execution time
Bulk Loader
SQL Queries (from apps)
Informix Warehouse Accelerator
Compressed DB partition
QueryProcessor
Data Warehouse
IDS SQL
(via DRDA)
Query Router
Informix Warehouse Accelerator: Connects to IDS via TCP/IP & DRDA Analyzes, compresses, and loads
Copy of (portion of) warehouse Processes routed SQL query and
returns answer to IDS
Results
45
Informix Warehouse Accelerator Overview
Coordinator Process
Orchestrating the distributed tasks like Load or Query execution
.
Have all the data in main memory spread across all cores. Do the compression and query execution.
IDS
Query parsing and matching to the Optimizer. Routing query blocks.
.
.
Worker Processes
46
Target Market: Business Intelligence (BI)
• Characterized by:
– “Star” or “snowflake” schema:
Complex, ad hoc queries that typically― Look for trends, exceptions to make actionable business decisions― Touch large subset of the database (unlike OLTP)― Involve aggregation functions (e.g., COUNT, SUM, AVG,…)― The “Sweet Spot” for the IWA!
City
Region
Store
SALES
Product
Period
Brand
Month
Quarter
Category
Dimensions
Fact Table
47
What IWA is Designed For
• Selective, fast scans over large (fact) tables
• Joins with smaller Dimension tables
• OLAP-style queries over large fact tables in relational star schema with grouping and aggregations
SELECT PRODUCT_DEPARTMENT, REGION, SUM(REVENUE)
FROM FACT_SALES F
INNER JOIN DIM_PRODUCT P ON F.FKP = P.PK
INNER JOIN DIM_REGION R ON F.FKR = R.PK
LEFT OUTER JOIN DIM_TIME T ON F.FKT = T.PK
WHERE T.YEAR = 2009
AND R.GEOID = 17
AND P.TYPEID = 3
GROUP BY PRODUCT_DEPARTMENT, REGION
48
Case Study #1: Major U.S. Shoe Retailer
• Top 7 time-consuming queries in Retail BI and Warehouse: (Against 1 Billion rows Fact Table)
Query IDS 11.5 IDS 11.7 IWA
1 22 mins 4 secs
2 1 min 3 secs 2 secs
3 3 mins 40 secs 2 secs
4 30 mins & up 4 secs
5 2 mins 2 secs
6 30 mins 2 secs
7 45 mins & up 2 secs
Our Retail users will be really happy to see such a huge improvement in the queries processing timings.
This IWA extension to IDS will really bring value to the Retail BI environment.
49
• Microstrategy report was run, which generates
• 667 SQL statements of which 537 were Select statements
• Datamart for this report has 250 Tables and 30 GB Data size
• Original report on XPS and Sun Sparc M9000 took 90 mins
• With IDS 11.7 on Linux Intel box, it took 40 mins
• With IWA, it took 67 seconds.
Case Study #2: Datamart at a Government Agency
50
Case Study #3: U.S. Government Agency
Query Description Informix Informix w/ IWA Notes Improvement
1 Find Top 100 Entities 1:28:22 0:01:28 Fact Table Scan 6023.23%
2 Find Top 100 Members 1:22:32 0:01:05 Fact Table Scan 7640.45%
3Summarize all transactions by
State and County 1:34:37 0:00:14 Fact Table Scan 41708.49%
4Detailed Report on Specific
Programs in a Date Range 0:00:06 0:00:06 Index Read 108.41%
5
Summarize all transactions by State, County, City, State, Zip, Program, Program Year, Commodity and Fiscal Year 1:48:58 0:00:41 Fact Table Scan 15800.89%
51
Agenda
• 3rd Generation Data Base Technology
• Overview of the Informix Warehouse Accelerator (IWA)
• Target Market
• Beta Customer Experience
• IWA vs. Row/Column/Hybrid Stores
• Loading IWA
• Referenced Hardware & Software Configuration
52
Row Oriented Data StoreEach row stored sequentially
• Optimized for record I/O
• Fetch and decompress entire row, every time
• Result –
• Very efficient for transactional workloads
• Not always efficient for analytical workloads
If only few columns are required the complete row is still fetched and uncompressed
53
Columnar Data Store Data is stored sequentially by column
If attributes are not required for a specific query execution,they are skipped completely.
• Data is compressed sequentially for column:
•Aids sequential scan
•Slows random access
54
Top 64 traded goods – 6 bit code
Rest
Prod Origin
Trade Info (volume, product, origin country)
CommonValues
Rare values
Nu
mb
er o
f O
ccu
rren
ces
Histogramon Origin
Histogram on Product
Origin
Pro
du
ct
ChinaUSA
GER,FRA,
… Rest
Table partitioned into Cells
Column Partitions
Vol
Compression: Frequency Partitioning
• Field lengths vary between cells• Higher Frequencies Shorter Codes (Approximate Huffman)
• Field lengths fixed within cells
Cell 4Cell 1
Cell 2
Cell 3
Cell 5
Cell 6
55
Male/John
Compression Process: Step 1 Input tuple
Column 1 Column 2
Co-codetransform
Type specifictransform
Column1 & 2
Column3.A
ColumnCode
TupleCode
ColumnCode
Column 3
Column3.B
ColumnCode
HuffmanEncode
Dict HuffmanEncode
DictHuffmanEncode
Dict
Male/John/Sat
Sat 2006
Male, John, 08/10/06, Mango
101101011 001 01011101
10110101100101011101
p = 1/512 p = 1/8 p = 1/512
w35/Mango
w35
Male John 08/10/06 Mango
Michael 4.2%
David 3.8%
James 3.6%
Robert 3.5%
John 3.5%
William 2.5%
Mark 2.4%
Richard 2.3%
Thomas 1.9%
Steven 1.5%
Mon Tue Wed Thu Fri Sat Sun
Male 3% 4% 10% 6% 23% 42% 12%
Female 4% 5% 9% 15% 17% 28% 22%
56
Compression Process: Step 2
First tuple code
Tuplecode
—
SortedTuplecodes1
PreviousTuplecode
Delta
HuffmanEncode
Delta Code
Append
Dict
CompressionBlock
10110101110000110010110101110001011111
1011010111000011101
10110101110001011101
—
—
10110101110001011101
0000000000000000001
000
000
00000000000000000001
010
010
0000000000000000101
1110
1110
—
Look Ma, no delimiters!101101011100010111010000101110
58
Register Stores Facilitate SIMD Parallelism
• Access only the banks referenced in the query (like a column store):
–SELECT SUM (T.G) –FROM T–WHERE T.A > 5–GROUP BY T.D
• Pack multiple rows from the same bank into the 128-bit register
• Enables yet another layer of parallelism: SIMD (Single-Instruction, Multiple-Data)!
A1 D1 G1
A2 D2 G2
A4 D4 G4
Bank β1 (32 bits)
A3 D3 G3
B1 E1 F1
B2 E2 F2
B4 E4 F4
C1 H1
C3 H3
C4 H4
Bank β2 (32 bits)Bank β3 (16 bits)
Ce
ll Blo
ck
B3 E3 F3
C2 H2
32 bits 32 bits32 bits32 bits
128 bitsResult1 Result2 Result3 Result4
Operand Operand Operand Operand
Vector Operation
59
Simultaneous Evaluation of Equality Predicates
State==‘CA’ && Quarter == ‘Q4’
State==01001 && Quarter==1110
Translate value queryto Code query
Row
Mask
Selectionresult
… … … …
11111 0 1111 0
01001 0 1110 0
==
&
• CPU operates on 128-bit units
• Lots of fields fit in 128 bits
• These fields are at fixed offsets
• Apply predicates to all columns simultaneously!
State Quarter
60
Agenda
• 3rd Generation Data Base Technology
• Overview of the Informix Warehouse Accelerator (IWA)
• Target Market
• Beta Customer Experience
• IWA vs. Row/Column/Hybrid Stores
• Loading IWA
• Referenced Hardware & Software Configuration
61
Defining, What Data to Accelerate
• A MART is a logical collection of tables which are related to each other. For example, all tables of a single star schema would belong to the same MART.
• The administrator uses a rich client interface to define the tables which belong to a MART together with the information about their relationships.
• IDS creates definitions for these MARTs in the own catalog. The related data is read from the IDS tables and transferred to IWA.
• The IWA transforms the data into a highly compressed, scan optimized format which is kept locally (in memory) on the Accelerator
Define
Worker Processes
Coordinator Process
IDS + IWA
62
IWA Design Studio
63
Distributing data from IDS (Fact tables)
Data Fragment
Fact Table
UNLOADUNLOADUNLOADUNLOAD
IDS Stored Procedures
Copy
A copy of the IDS data is now transferred over to the Worker process. The Worker process holds a subset of the data (compressed) in main memory and is able to execute queries on this subset. The data is evenly distributed (no value based partitioning) across the cpus.
Coordinator Process
Worker Process
Compressed Data
Compressed Data
Compressed Data
Compressed Data
Compressed Data
Compressed Data
Worker Process
Worker Process
Data Fragment
Data Fragment
Data Fragment
64
Dimension Table
Dimension Table
Dimension Table
Dimension Table
Distributing data from IDS (Dimension tables)
IDS
UNLOADUNLOADUNLOADUNLOAD
IDS Stored Procedure
Dimension Table
Dimension Table
Dimension Table
Dimension Table
All dimension tables are transferred to the worker process.
Dimension Table
Dimension Table
Dimension Table
Dimension Table
Dimension Table
Dimension Table
Dimension Table
Dimension Table
Coordinator Process
Worker Process
Worker Process
Worker Process
65
Inside IWAInside IDS
Mapping Data from IDS to IWA
Data Fragment
Data Fragment
Data Fragment
Data Fragment
Data Fragment
Data Fragment
Fact Table
Dimension Table
Dimension Table
Dimension Table
Dimension Table
Data Fragment
Data Fragment
Data Fragment
Data Fragment
Data Fragment
Data Fragment
Fact Table
Dimension Table
Dimension Table
Dimension Table
Dimension Table
Compressed
66
Agenda
• 3rd Generation Data Base Technology
• Overview of the Informix Warehouse Accelerator (IWA)
• Target Market
• Beta Customer Experience
• IWA vs. Row/Column/Hybrid Stores
• Loading on IWA
• Referenced Hardware & Software Configuration
67
IWA Referenced Hardware Configuration
Intel(R) Xeon(R) CPU X7560 @ 2.27GH 4 X 8
Memory 512G
6 disks 300 GB SAS hard disk drives each
- 4-processor, 4U rack-optimized enterprise server with Intel® Xeon® processors.
- 8-core, 6-core and 4-core processor options with up to 2.26 GHz (8-core), 2.66 GHz (six-core) and 1.86 GHz (four-core) speeds with up to 16 MB L3 cache
- Scalable from 4 sockets and 64 DIMMs to 8 sockets and 128 DIMMs
- Optional MAX5 32-DIMM memory expansion
- 16x 1.8" SAS SSDs with eXFlash or 8x 2.5" SAS HDDs
Options:
68
IWA Software Components
• Linux on Intel x86_64 (RHEL 5 or SUSE SLES 11)
• IDS 11.70 + IWA code modules including IDS Stored Procedures
• ISAO Studio Plug-in – GUI for Mart definition
• OnIWA – On Utilities for Monitoring IWA
69
(Fred Ho – [email protected])
70