temporal tables and data versioning - now and before db2...

50
The attendee will get a deep dive into all the DDL changes needed in order to exploit DB2 V10 Temporal tables as well as the limitations. A case study scenario using a live DB2 V10 system will be used to illustrate how the data is versioned and how the application needs to understand what to request in order to get the desired data. Finally we will look at which currently implemented solutions can be modified to use Temporal tables instead. 1

Upload: phunghanh

Post on 03-Apr-2018

221 views

Category:

Documents


5 download

TRANSCRIPT

The attendee will get a deep dive into all the DDL changes needed in order to

exploit DB2 V10 Temporal tables as well as the limitations. A case study

scenario using a live DB2 V10 system will be used to illustrate how the data is

versioned and how the application needs to understand what to request in order

to get the desired data. Finally we will look at which currently implemented

solutions can be modified to use Temporal tables instead.

1

The attendee will get a deep dive into all the DDL changes needed in order to

exploit DB2 V10 Temporal tables as well as the limitations. A case study

scenario using a live DB2 V10 system will be used to illustrate how the data is

versioned and how the application needs to understand what to request in order

to get the desired data. Finally we will look at which currently implemented

solutions can be modified to use Temporal tables instead.

Everything I say and write are solely my own expressions and do not reflect

the opinion of CA technologies.

The IBM documentation has been used very sporadic, so results and syntax

may vary from the documentation provided by IBM.

Since everyone working with DB2 (and other database management systems

for that matter) have had the need to implement what temporal tables provide

in DB2 10, let‟s have a quick look at some possible implementations how

customers have survived until now.

The majority of the presentation will be around DDL and DML changes by

illustrating a scenario.

Unlike many previous few features (like clone tables, NOT logged etc.) the

utility impact when dealing with temporal tables is very minimal.

This is a definition of temporal data from Wikipedia – interesting article and I

can recommend reading it to get deeper insight (I have only included a small

portion on this slide..

Some historical facts about temporal table – found on the same result list from

the previous slide. It is quite interesting that TEMPORAL DATA in the

database terminology is relative young.

I believe most of us have implemented solutions to handle temporal data for

many years, but we had to come up with special implementations and design

the data model to support this need.

The most straight forward solution was to have a base table with the live data

and then implement application logic to maintain a history table – and

hopefully the application code remembered to look at the various places for

the “correct” information.

DB2 V7 automated some of these tasks by allowing TRIGGERS. At least

some of the application logic could be eliminated but there was still the

opportunity for failure when the “correct” data was going to be retrieved.

Stored Procedures was another alternative to assist with maintaining/retrieving

“temporal data”.

Bottom line – no matter which solution, maintenance was an issue and just

made the concept more complicated.

In order not to have the application code maintain history, some sites used a

complete different method (and still do).

Depending on how accurate and up-to-date the history information must be,

this solution might not be valid since it requires access to the DB2 log.

Once a day (or how frequent needed) the log records for the tables where data

versioning was/is needed are extracted and loaded into a special history table

(basically the base appended with a timestamp column).

This method is much less error prone but the timing issue (how current the

history table must be) can be an issue and will have to be dealt with.

The nice part of this implementation is that the transaction doesn‟t get any

performance penalties inserting rows into the history, calling triggers etc.

DB2 for z/OS offers three types of temporal tables – all different in nature and

how they operate and how they are created.

SYSTEM TIME temporal tables: controlled solely by DB2 by keeping “old

images” in a separate history table every time a row is updated or deleted.

BUSINESS TIME temporal tables are controlled to a greater extent by the

application. Unlike SYSTEM TIME, this type has no history table associated –

if a row has different content based on time, this is controlled inside the same

table.

It is possible to mix both types – and you have BI-TEMPORAL tables.

The easiest way to see the differences is by using a scenario with a number of

events reflected in the TEMPORAL tables. This presentation only deals with

SYSTEM TIME and BUSINSESS TIME.

The story we will use in this presentation is „Steen‟s life and where he lived” –

illustrating how temporal tables can be used to record and store changes “out-

of-the-box”.

This matrix illustrates the different “transactions” we will incorporate into the

System Time temporal table scenario to illustrate the lifetime and DDL as well

as DML needed to implement this kind of solution.

What you have to pay attention to is – some incidents/transactions happens

AFTER the day the real “transaction” / incident / issue happened – we will

illustrate the difference between SYSTEM TIME and BUSINESS TIME later

in the presentation.

We start to create the BASE table STEEN_LIFE (top frame).

In order to exploit and implement SYSTEM TIME temporal table for this

table, it is necessary to add three timestamp columns (when the row is inserted,

end of row and transaction timestamp).

The last task is to describe to DB2 that the STEEN_LIFE table is ready to be

associated to a HISTORY table by adding the PERIOD_TIME by describing

which columns describe the “life of the row - here ROW_START and

ROW_END)

In order to complete the preparation, we need to create the HISTORY table to

hold any UPDATED/DELETED rows from the BASE table.

The easiest method is to create the history table using the LIKE clause.

The last task is to associate the BASE with the HISTORY – done using the

ALTER TABLE ADD VERSIONING.

Now it‟s time to illustrate the outlined scenario . . . What happened in Steen‟s

life – using Temporal tables

What I figured out was that you can‟t use DATE and regular TIMESTAMP

columns for the BASE table to enable SYSTEM TIME TEMPORAL – you

will have to use the DB2 10 timestamp extension.

I really did try to use DATE as the START-END ROW – but DB2 rejected me

without reading the doc – it probably makes sense since you could update

the same row multiple times per day/minute etc.

13

Unlike CLONE tables and other newer features in DB2 where tables /objects

are “mimic‟ed” , the base and history tablespace can be completely different.

This example illustrates the BASE table residing in a PBR UTS tablespace

while the HISTORY table can be PBG - which absolutely makes sense since

the history table can grow a lot depending on the activity on the base object.

14

First event to illustrate – Steen is born 4/14-1961 but the event isn‟t reported

by the birth clinic until 4/14-1961.

The FIRST insert only happens in the BASE table – let‟s move on . . . . .

The second event is that Steen is being Christened (and getting his official

name) – so the row is updated June 22nd 1961. The previous BASE row is

inserted into the HISTORY table as-is – except for the fact that ROW-END

column will adopt the transaction timestamp.

We now have the BASE row with current status and one row in the history

table reflecting the previous image.

The next even illustrates an address move – the base row is updated with the

new information and the history table gets the previous image (now holding

two historical images of this specific row).

And yet another address change – the history table now holds three previous

images of Steen‟s life.

Steen decides to move to the US after getting his Green Card. Even though he

did the move late 2002, he didn‟t report the move until February 22nd 2003.

This isn‟t good since he will have to pay taxes in both Denmark and the US for

the period. In order to avoid taxes in Denmark – what is needed is the ability to

have the START date in the base table accepting a date from the past (we will

get back to this issue later and especially when dealing with BUSINESS TIME

temporal tables).

As you can see – the history table keeps on growing so definitely something

you need to consider when designing the database.

The last even covered in this scenario is the end of Steen‟s life.

It is noteworthy to pay attention to the fact that when the row is deleted from

the base table, an INSERT takes place in the history table. You can say that

Steen will have eternal life due to the history table when using SYSTEM

TIME temporal tables.

Now that we have covered how INSERT/UPDATE/DELETE works in action

for a SYSTEM time temporal table – let‟s have a close look at what to be

aware of – especially when planning which temporal table type to use for the

application.

In order to retrieve data – you can use normal SQL to select from either table.

However - the power of TEMPORAL tables is associated with some SQL

extensions.

Using AS OF “timestamp” in conjunction with FOR SYSTEM_TIME, DB2

will find the appropriate row(s) for the requested time. This might mean you

get the BASE row or zero, one or more history rows. You can also use the

BETWEEN predicate following the FOR SYSTEM_TIME clause.

The documentation clearly states that the timestamp columns for SYSTEM

TIME temporal tables cannot be updated !!

The first example illustrates that the SQL can be executed – and interestingly

accepted by DB2 – but nothing is updated !!??!!??

The behavior is predictable for the BASE table – a similar UPDATE request is

not accepted by DB2 (remember that the additional columns are defined AS

ALWAYS). Trying to update the TEMPORAL timestamp columns results

correctly in SQL-151.

Using the SQL extensions FOR SYSTEM_TIME AS OF or BETWEEN, DB2

will have to check both the BASE and HISTORY table to make sure the

correct answer set is returned.

This is illustrated in this example by doing the EXPLAIN : both the base and

history table accessed. What puzzles me is what happened to QBLOCKNO 3

and 4 while the UNION is expected.

Notice the NON-CORRELATED SUBQUERY which is the re-written SQL

using AS OF (more about this later – in fact next slide )

To wrap up this topic and continue the EXPLAIN story, it isn‟t possible to use

correlation when accessing a temporal table.

An interesting article can be found in the June/July edition of System Z

Journal – written by Dan Luksetich.

Like we have schema challenges using CLONE tables, temporal tables are not

much different. The BASE and HISTORY must be identical.

Interesting that DB2 allows to add columns to the base table and not to the

history table – the explanation is that DB is pretty intelligent : the new column

is automatically being added to the HISTORY table so the SYSTEM TIME

design can continue un-interrupted.

There are certain changes which cannot be implemented / altered, so it is

necessary to get rid of the history – do the schema changes and put the history

back again (with the same changes).

The last box illustrates it is possible to INSERT into the history table – which

will be helpful to reload the existing history data after schema changes are

implemented.

If schema changes are needed – and these are not supported, the method is:

1) DROP VERSIONING to cut the link between BASE and HISTORY

2) Unload the HISTORY and drop the HISTORY – alternatively execute

schema changes for “history”

3) The same changes and tasks for BASE

4) Get data back if object has been dropped

5) ADD VERSIONING.

Attempting to implement schema changes not allowed when versioning is in

effect will result in SQL-478 / SQL-669

Another limitation you need to be aware of – especially when considering

growth and partitioning schema, it is not possible to ADD / ROTATE

partitions.

If this is needed – please review previous slides for UN-ALTERABLE

changes.

Let us have a close look at BUSINESS TIME temporal tables and what the

differences are. We will basically cover the same scenario of “Steens life” with

a couple of modifications.

30

We will be using the exact same base table definition as for the SYSTEM

TIME temporal table scenario.

In this example the base definition is already created, so I have to use Not Null

With Default – unless dropping and recreating. This is done in order to

illustrate its possible to enable TEMPORAL for existing tables.

Unlike SYSTEM TIME temporal tables, its possible to specify attributes for

the START / END columns other than TIMESTAMP(12) or ZONE – we can

use DATE instead – and the columns don‟t have to be ALWAYS generated.

Another difference is the enabling of temporal : using ADD PERIOD

BUSINESS_TIME as opposed to SYSTEM_TIME.

Unlike for SYSTEM temporal tables, using BUSINESS temporal we don‟t

need a history table – only if BI-TEMPORAL is needed. In this scenario we‟re

using a clean BUSINESS temporal design – not BI-TEMPORAL !!

One important issue to think about – since we don‟t have a history table, this

means that the BASE can have multiple rows for the same key. Meaning, you

will need to modify how the UNIQUE indexes are defined. For this purpose,

DB2 offers DUPLICATES within the key – if you create the index WITHOUT

OVERLAPS.

Like mentioned on the previous slide – one UNIQUE key / value can have

multiple entries due to different “valid timeframes”. Using WITHOUT

OVERLAPS allows the UNIQUE key to have multiple entries as long as the

START/END date/timestamp don‟t overlap – pretty neat.

The index will have to be specified with the “normal unique column(s)” and

then the clause BUSINESS_TIME WITHOUT OVERLAPS (the first box).

As always restrictions apply – for this kind of index you cannot specify it to be

PARTITIONED.

The second box illustrates that SSN column used to be unique but can exist as

duplicate as long as WITHOUT OVERLAPS is specified – meaning one entity

can exist only once at any specific point-in-time.

The first event for the BUSINESS TIME temporal scenario is the birth of

Steen. Unlike the SYSTEM TIME, we can dictate the START and END

dates/timestamp – so here we basically BACK-DATES the event.

In this case the START / END columns are defined as NNWD – something

you will have to consider during the design – since the application might not

want DEFAULT / CURRENT values for DATE / TIMESTAMP.

The second event is where Steen is being giving a name.

The table is updated as expected – the difference between BUSINESS TIME

and SYSTEM TIME is that no history table is involved. Instead the BASE

table will get additional row(s) depending on the START-END dates specified

in the UPDATE statement.

The cool thing about BUSINESS temporal tables is that you even can specify

events in the future – and even in the past if needed (we will see later).

In order to illustrate the power of BUSINESS TIME temporal tables, we will

tweak the Steen story a little bit : Steen doesn‟t report the address change

during divorce where he would have had to pay more taxes to the municipality.

Also – when he moves to the US – he wanted the actual date to be reflected on

the tax report – not the date he actually reported the move from Denmark to

the US.

Let‟s see . . . .

The third event is pretty straight forward – just a new address resulting in

another row with a new START date.

But there‟s a twist – since the address was reported after the fact, we need to

make sure this is reflected (meaning update in the past). For this purpose you

need to specify WHEN the row content is valid – hence another new piece of

syntax : FOR PORTION OF BUSINESS_TIME dictates when the content is

valid. This specific update results in TWO rows instead of one (or two instead

of three – or even three instead of one in case the end-date also changes).

The next event is where Steen gets divorced – he reports an address change

from when he gets his new apartment – which in fact is AFTER he moved out

from his ex-wife.

The “trick” here is that Steen reports the address change AFTER THE FACT –

so we need to specify this as a past event – hence using PORTION OF to

dictate the date range when the new information is valid.

So – Steen didn‟t report WHEN his divorce started (wanted to save tax) – but

something went South and the ex-wife reported the address change. Bottom

line – the tax department decides to change the address for a very specific

timeframe – hence the update covering PORTION OF from October 30 2002

until April 30 2003.

This results in TWO new rows in the table.

Just like covered in SYSTEM TIME for the HISTORY table, using

BUSINESS TIME might cause a lot of new entries – PLAN ACCORDINGLY

!!!!!!

Let‟s have a quick look into how SQL retrieval works for BUSINESS TIME

temporal tables.

In the previous scenario of SYSTEM TIME we used AS OF and BETWEEN.

The exact same parameters can be used for BUSINESS TIME.

In case you want to use GT / LT etc. these operators cannot be used in

conjunction with FOR BUSINESS_TIME but will have to be used as “normal”

predicates – but can be done !!

Just like AS OF can be used – using FOR BUSINESS_TIME BETWEEN two

dates / timestamps is absolutely valid. Just have in mind that if AS OF isn‟t

used, you might get zero, one, two or many rows.

As mentioned earlier – normal BETWEEN date/timestamp is valid and can be

used as well.

Now on to some “interesting aspects discovered” – personally I don‟t know if

these are bugs or WAD – issues have not been opened with IBM and the RSU

is early 2012.

This SQL SELECT statement seems invalid – no COMMA between

STA_DATE and SUBSTR – still getting a “valid” response set !!!!

Might not be an issue – but interesting.

You don‟t have to use FOR BUSINESS_TIME when retrieving rows – normal

predicates are still valid, but think for what you are asking for – maybe only

one row expected but several returned !!

The next event is Steen‟s passing !! So we delete him from the table – let‟s

look at the major differences between SYSTEM TIME and BUSINESS TIME.

For SYSTEM TIME the BASE table was empty – but the HISTORY table had

all past changes.

For BUSINESS TIME – there‟s NOTHING left – all well cleaned up. Another

issue to think about when implementing TEMPORAL tables – maybe this is a

reason to use BI-TEMPORAL.

It‟s time to check what DB2 really does under the cover to keep track of all

these moving parts.

When you ALTER the table to add VERSIONING or BUSINESS_TIME, DB2

creates constraints in the DB2 catalog – keeping track of what can be done /

can NOT be done and how to operate when updates/deletes happen.

45

Another look at SYSCHECKS – describing WHICH columnsand tables are

involved.

46

Time to start to wrap up – there are TWO different (in fact three) TEMPORAL

tables – and which one to choose really depends on the application, the design

and how the application should behave. Pick the one that FITS – and make

sure the application does what‟s needed.

Also – be careful when current application are converted – there might be

TRIGGERS, Stored procedures etc. which might make things more difficult.

47

For utilities – there isn‟t a lot to be aware of compared to CLONBE tables,

NOT LOGGED objects etc. Just make sure your history tables are ready for

recovery and plan schema changes accordingly.

49

May 16, 2010 [Presentation Name via

Insert tab > Header & Footer] Copyright

© 2010 CA

50