highlighted. eventually, we compare idaa to oracle’s in

When queries are rerouted from DB2 to IDAA, they typically perform better

by an order of one or two magnitudes. This presentation shows how data

distribution and clustering within IDAA can further improve response times.

We also discuss approaches to automatically detect and implement efficient

distribution and organizing keys by analyzing IDAA data access pattern and

data statistics. Reporting on IDAA benefits to management will also be

highlighted. Eventually, we compare IDAA to Oracle’s In-Memory Column

Store option from a technical perspective including a comparison of query

response times for different OLAP queries.

1

2

Disclaimer:

The Information contained in this presentation has not been submitted to any formal Swiss Mobiliar or other review and is distributed on an ‘as is‘ basis without any warranty either expressed or implied. The use of this information is the user‘s responsibility. The procedures, results and measurements presented in this paper were run in either the test and development environment or in the production environment at Swiss Mobiliar in Berne, Switzerland. There is no guarantee that the same or similar results will be obtained elsewhere. Users attempting to adapt these procedures and data to their own environments do so at their own risk. All procedures presented have been designed and developed for educational purposes only.

In contrast to many other IDAA installations, the scope of IDAA at Swiss

Mobiliar is focused on operational data, which means data of the core

information systems, and not on replicated data typically used for decision

support systems.

Additionally, data of other platforms is also replicated into IDAA in order to

benefit from Netezza‘s MPP architecture to produce much better response

times, and to allow to join this non-DB2 data together with DB2 data in an

efficient way

12

13

So far, existing workload has been re-routed to IDAA and optimized for

speed. Due to the reduced response times, a lot more of such reports were

produced.

In order to create even more opportunities, the focus switches now to making

the business aware of new types of queries, new business functions, etc.:

Increase the business awareness of IDAA.

After installation and first tests, ad-hoc analytical queries were re-routed to

IDAA, followed by scheduled workload, COBOL programs, Excel macro‘s

SQL code, reporting tools etc..

From a business perspective, log analysis from DB2 tables to derive insights

of application usage access patterns was the first IDAA based application,

followed by improving end-of-month processing, still more ad-hoc reports

from a larger user crowd, and improving ETL flow.

Eventually, a couple of DB2 secondary indexes could be removed, and

phyiscal database design options such as „append on insert“ or member

clustering were more often applied – leading to even better response times

and reduced CPU consumption.

15

We don‘t store facebook or twitter data on DB2, nor even the session information of internet

users of our web pages. As IDAA is DB2 based, such information are not within the scope of

IDAA. But to make business units aware of the possibilities of IDAA based reports, we

analyzed the behaviour of our agency‘s employees when it comes to using our CRM system.

Instructions exist on using it as a primary hub before switching to applications containing

more detailled information. A large bubble in the slide highlights an agency where this policy

is strictly followed; the smaller the bubbles are, the less this policy is applied, and the

employees are directly assessing the detail-applications without starting at the CRM system.

This information on internal system usage showed some surprising results, and as the

business units are directly concerned, their awareness to these kind of reports and to IDAA in

general is raised.

But what about Real-Time-Analytics? The slide above represents an observation interval of

14 months. How did they do this morning?

Most agencies don‘t seem to be much

impressed by those statistics.

However, at least some of them

changed their behaviour, see

Hochdorf in the Lucerne region

(marked in green) which made a

progress from average to top.

16

Ad-hoc queries and other analytical workload running directly on DB2 often

require indexes. Which means that these queries must be known in advance

and optimized by your DBA. In other words: This is an expensive and time-

consuming process: If you come up with a new idea not yet supported by

indexes or MQTs, they have to be built first. If these queries run without

proper support, they can severely impact the online transaction response

times. Another downside of indexes is their need of space, and the resources

necessary to update these indexes. Eventually, the whole system gets either

over-indexed or new queries run without index support – anyway, the effect is

that end-user won’t query the database anymore because of response times

becoming inacceptable, and therefore the information residing in the database

is not any longer analyzed.

22

23

Data residing in the IDAA is stored and accessed on a column-based

paradigm rather than the row-oriented paradigm used by DB2. This makes the

amount of data to be scanned much smaller (a typical analytical query scans a

few attributes of very many rows and has to access the referenced columns

only, not the whole table as it is the case for row-based access). Thus, indexes

become obsolete for these kind of accesses. Furthermore, data compression is

much more effective on IDAA compared to DB2, and inherent parallelism for

query processing is widely used without much administration effort due to

Netezza’s broad usage of its MPP (massiv parallel processing) architecture.

The downside of this technology is that directly accessing a single table row is

as complex as a scan through the whole table, and updating a single row

becomes very expensive.

26

Rules of thumb regarding selection of distribution keys:

A random distribution key provides good access paths.

For tables with > 100 Mio rows, an explicit distribution key

should be selected

The choice of the distribution key is triggered by

Data Skew

Processing Skew

Avoidance of Data Re-distribution

A distribution key should consist of only one column

Use random distribution key for small reference tables

30

Rules of thumb for selection of organizing keys:

Small tables don‘t benefit from the definition of organizing

keys, due to the small amount of data to be scanned

Large tables (> 1 mio records) benefit most assuming that

queries restrict on column values which are physically

scattered across the table

There is no preference for any of the organizing key columns

Not all organizing keys need to be referenced in a query

for organizing keys to improve query performance.

All data types ok

first 8 Bytes of CHAR data types considered

for numerics, up to 18 digits considered

34

The following query calculates the column skew value displayed on the slide:

Example for CARDF=54321:

Select (100.0 * sum(no) / (select count(*) from T1) - 10.0)

from

(select C1, dec(count(*), 15,3) as no

from T1

group by C1

order by 2 desc fetch first 5432 rows only) t

This query calculates the number of rows in a table having a column value

equal to the 10% most frequent columns values . In general., replace the fix

10% value (n=0.1) with any individually selected value for n:

select (100.0 * sum(no) / (select count(*) from T1) – 100 * n)

from

(select C1, dec(count(*), 15,3) as no

from T1

group by C1

order by 2 desc fetch first CARDF * n rows only) t

//AQTSCS03 EXEC PGM=IKJEFT01

//* parameter #1 for accelerator name

//AQTP1 DD *

IDAADB2P

/*

//* parameter #2 for alter containing tables specification

//AQTP2 DD *

<?xml version="1.0" encoding="UTF-8"?>

<aqttables:tableSpecifications

xmlns:aqttables="http://www.ibm.com/xmlns/prod/dwa/2011" version="1.0">

<table name="TEO_TDSTAT2" schema="DB2PROD">

<distributionKey>

<column name="C63654"/>

<column name="C32006"/>

</distributionKey>

<organizingKey name="C63654"/>

<organizingKey name="C63655"/>

</table>

</aqttables:tableSpecifications>

/*

//* parameter #3 for message input to control trace

//AQTMSGIN DD *

<?xml version="1.0" encoding="UTF-8" ?>

<spctrl:messageControl

xmlns:spctrl="http://www.ibm.com/xmlns/prod/dwa/2011"

version="1.0" versionOnly="false" >

</spctrl:messageControl>

/*

//SYSTSPRT DD SYSOUT=*

//SYSPRINT DD SYSOUT=*

//SYSUDUMP DD SYSOUT=*

//SYSTSIN DD *

DSN SYSTEM(DB2P)

RUN PROGRAM(AQTSCALL) PLAN(AQTSCALL) -

LIB('SYS1.DAA310B.U.LOAD') PARMS('ALTERTABLES')

END

/*

39

By avoiding the unnecessary movement of data off-platform, it has become

possible to perform real-time analytics because decisions are made based on

the most accurate data available, not some stale copy. Integrating analytics

technologies with transactional systems enables insights to be injected

directly into operational decision processes. Analytics will share the same

business-critical support that operational systems enjoy today.

46

highlighted. eventually, we compare idaa to oracle’s in

Documents